## **Göttinger Studien zur Entwicklungsökonomik Göttingen Studies in Development Economics**

HKS 44

Herausgegeben von/ Edited by Hermann Sautter und/and Stephan Klasen

Bd./Vol. 31

LANG Maria Ziegler · Institutions, Inequality and Development

GSEW 31-Ziegler-260541A5HC-TP.indd 1 13.12.10 11:35:31 Uhr

Maria Ziegler, born in Stollberg (Germany) in 1980, studied Public Policy and Management at the University of Konstanz and the University Paris 1, Panthéon-Sorbonne. Between 2006 and 2010 she was a Ph.D. candidate at the Department of Economics at the University of Göttingen. During her studies she also worked as a consultant for various international agencies in

The book focuses on the linkages between institutions, inequality and development. It analyzes formal political institutions, in particular the relationship between democracy and human development. It also centers on informal social institutions leading to the exclusion of population groups such as women and indigenous people. To measure these institutions in the case of gender inequality the Social Institutions and Gender Index (SIGI) and its five subindices are proposed and for ethnic inequality dummy variables indicating ethnic origin are used. The dissertation shows that formal and informal institutions affect human development, the governance of a society and inequality.

www.peterlang.de ISBN 978-3-631-60541-7

Latin America and Africa.

31

The book focuses on the linkages between institutions, inequality and development. It analyzes formal political institutions, in particular the relationship between democracy and human development. It also centers on informal social institutions leading to the exclusion of population groups such as women and indigenous people. To measure these institutions in the case of gender inequality the Social Institutions and Gender Index (SIGI) and its five subindices are proposed and for ethnic inequality dummy variables indicating ethnic origin are used. The dissertation shows that formal and informal institutions affect human development, the governance of a society and inequality.

Maria Ziegler

**Göttinger Studien zur Entwicklungsökonomik Göttingen Studies in Development Economics** Herausgegeben von/ Edited by Hermann Sautter und/and Stephan Klasen

HKS 44

Bd./Vol. 31

Institutions, Inequality

and Development

PETER LANG Internationaler Verlag der Wissenschaften

Maria Ziegler, born in Stollberg (Germany) in 1980, studied Public Policy and Management at the University of Konstanz and the University Paris 1, Panthéon-Sorbonne. Between 2006 and 2010 she was a Ph.D. candidate at the Department of Economics at the University of Göttingen. During her studies she also worked as a consultant for various international agencies in Latin America and Africa.

www.peterlang.de

Maria Ziegler - 978-3-653-00576-9 Downloaded from PubFactory at 01/11/2019 11:43:38AM via free access LANG Maria Ziegler · Institutions, Inequality and Development

GSEW 31-Ziegler-260541A5HC-TP.indd 1 13.12.10 11:35:31 Uhr

31

Institutions, Inequality and Development

## **Göttinger Studien zur Entwicklungsökonomik Göttingen Studies in Development Economics**

Herausgegeben von/ Edited by Hermann Sautter und/and Stephan Klasen

Bd./Vol. 31

Maria Ziegler

# Institutions, Inequality and Development

### **Bibliographic Information published by the Deutsche Nationalbibliothek**

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the internet at http://dnb.d-nb.de.

Open Access: The online version of this publication is published on www.peterlang.com and www.econstor.eu under the international Creative Commons License CC-BY 4.0. Learn more on how you can use and share this work: http://creativecommons.org/licenses/by/4.0.

All versions of this work may contain content reproduced under license from third parties.

Permission to reproduce this third-party content must be obtained from these third-parties directly.

This book is available Open Access thanks to the kind support of ZBW – Leibniz-Informationszentrum Wirtschaft.

> Cover design: Olaf Glöckler, Atelier Platen, Friedberg

Cover illustration by Rolf Schinke

Gratefully acknowledging the support of the Ibero-Amerika-Institut für Wirtschaftsforschung, Göttingen.

### ISBN 978-3-653-00576-9 (eBook)

D 7 ISSN 1439-3395 ISBN 978-3-631-60541-7

© Peter Lang GmbH Internationaler Verlag der Wissenschaften Frankfurt am Main 2011

### www.peterlang.de

*Meinen großartigen Großeltern*

# **Editors Preface**

Progress in achieving the Millennium Development Goals has been uneven and unsatisfactory in many dimensions. In this volume, Maria Ziegler examines the unsatisfactory progress by highlighting the role of informal and formal institutions structuring social interactions, distributing power in a society and therefore affecting the freedoms to choose a life according to one's needs and preferences.

The first essay investigates the influence of democracy on progress in education and health in developing countries. The theoretical part tries to answer the question why democracy influences health and education focusing on redistribution as well as accountability and responsiveness in political systems. Secondly, it addresses the question of whether this effect depends upon other factors such as inequality, the level of development, education of the population and ethnic diversity. Using international panel data a robust positive and significant effect of democracy promoting health and education is found. However, the interaction effects of democracy with GDP per capita, inequality, ethnic fractionalization and education turn out to be insignificant or not robust. Carefully interpreted, democratic institutions are themselves important for human development and less the circumstances under which they occur.

There is another type of institutions, namely informal social institutions that should not be neglected in the study of development outcomes. These informal institutions are often taken-for-granted, and provide role models and social exclusion mechanisms. Those social institutions that are related to gender inequality and distribute power between men and women in daily life build the focus of the next three essays.

Essay 2 centers on the measurement of social institutions related to gender inequality proposing the Social Institutions and Gender Index (SIGI) and its five subindices Family code, Civil liberties, Physical integrity, Son preference and Ownership rights, which are now officially used by the OECD Development Centre. In the first step, the five one-dimensional subindices are constructed by aggregating variables of the OECD Gender, Institutions and Development Database with polychoric principal component analysis. In a second step, the subindices are combined using the Foster-Greer-Thorbecke poverty measurement approach

to calculate the SIGI. Preliminary analyses show that the SIGI is empirically non-redundant to other gender-related indices and can be used to compare the societal situation of women in over 100 developing countries.

Essay 3 investigates whether the newly proposed indices can explain development outcomes such as female education, child mortality, fertility and governance (rule of law and civil liberties). In particular, the study aims at separating the exploratory value of the SIGI from the one of religion, region, the political system and income. The theoretical motivation is based on household bargaining and investment models. The empirical results show a robust significant effect of at least one of the social institutions indices on the development outcomes. Controlling for religion, political system, geography and the level of economic development, higher inequality in social institutions related to gender is associated with worse development outcomes.

Essay 4 concentrates on the relationship between social institutions and gender inequality and governance focusing on corruption. Embedded into the literature on gender inequality and corruption, the study highlights that a worse social status of women in a society measured by a higher inequality in social institutions related to gender is associated with a higher perceived level of corruption in a society even if one controls for representation of women in the society and democracy as well as other factors proposed by the literature.

The last essay focuses on another marginalized group, the indigenous population, whose situation is not clearly covered by the Millennium Development Goals but deserves attention as they are overrepresented among the world's poor. Essay 5 analyzes the relationship between ethnic origin and health inequality in Bolivia and shows that social exclusion and institutional mechanisms – measured with significant dummies for ethnic origin – are relevant factors for racial differences in health. However, this perspective might lead to unsuccessful policy interventions as it does not consider other factors that are associated with both ethnic origin and health, such as material wealth, urban-rural differences, geographical location and other household and maternal characteristics. The two major results are that first ethnic origin matters but that there is heterogeneity in health outcomes within the indigenous population. Secondly, health knowledge and mother's education could be responsible for health outcomes differences between ethnic groups, and the role of both variables as a pathway between ethnic origin and health outcomes should be investigated further.

Overall, the volume of Maria Ziegler makes an important contribution to the empirical literature on the linkage between institutions, inequality and economic and human development.

## **Acknowledgements**

The way was long, the way was hard, but it was worth the effort.

First of all, I would like to thank my supervisor Professor Stephan Klasen for giving me the opportunity to write this dissertation and to work in a dynamic, international environment. He supported me during the whole time span of the dissertation, he was demanding but also understanding. His scientific input was always of high value for my work and I have to admit that I am always impressed by the diversity of knowledge he has and the range of the projects he manages. I would also like to thank my second supervisor Junior Professor Carola Grün for her kind and constructive comments, which improved my work considerably. Professor Matin Qaim agreed to be the third examiner of my doctoral dissertation and I would like to thank him for this.

My colleagues deserve mentioning as well, as they filled my days with humor, motivation, understanding and inspiration. I thank my co-authors Sebastian Vollmer, Boris Branisa and Elena Gross. In particular, I thank Michaela Beckmann for administrative help, Felicitas Nowak-Lehmann Danziger for her warm and scientific support at the end of the thesis and I thank Kenneth Harttgen, Inmaculada Martínez-Zarzoso, Adriana Cardozo, Jan Priebe, Johannes Gräb and Tobias Lechtenfeld for scientific advice. I also want to thank all my other colleagues - there are too many to name them all.

On my way many more scientists gave me helpful comments. I thank Christian Bjørnskov, Mark Dincecco, Isabel Günther, Dierk Herzer, Johannes Jütting, Denis Drechsler, Tatyana Krivobokova, Juan R. de Laiglesia, Oleg Nenadic, Jean-Marc Siroën, Stefan Sper- ´ lich, Walter Zucchini and other anonymous referees. In addition, I thank the participants of the following conferences: The Institutional and Social Dynamics of Growth and Distribution (Pisa, 2007), Conference of the International Society for Comparative Economic Studies (Sao Paulo, 2008), International Economic Association's conference (Istanbul, 2008) American Economic Association's annual conference (San Francisco, 2009), International Conference on Gender and the Global Economic Crisis (New York, 2009), North American Summer Meeting of the Econometric Society (Boston, 2009), Far East and South Asia Meeting of the Econometric Society (Tokyo, 2009) and Singapore Economic Review Conference (Singapore, 2009).

I also want to thank the German National Academic Foundation, which provided me with a scholarship without which it would not have been possible to write this dissertation. I also acknowledge travel funds from the University of Göttingen, the Universitätsbund Göttingen and the Verein für Socialpolitik.

Finally, my deepest thanks go to my grandparents and André as well as all friends, in particular Sophia, Karin, Susanna, Laura, Franzi, Theresa, Gesine, Anne, Doreen, Knut, Julian, Felix, Marco and Tobi, who encouraged me to go this way. They gave me stability during times of crisis and support whenever I needed it. Last not least I thank my parents that believed in me and supported me as long as they could.

# **Contents**




# **List of Figures**


# **List of Tables**




# **List of Abbreviations**



## **Introduction**

### **The State of Development**

In September 2010, the Summit on the Millennium Development Goals (MDGs) - the highlevel plenary meeting of the General Assembly - will take place to review the implementation of the MDGs and to identify areas of action to achieve them by 2015. Although some progress in terms of fighting poverty and hunger, improving health and education and other aspects of the MDGs has been achieved, progress has been uneven. In 2005, there were still 1.4 billion people living on less than \$1.25 a day and further progress has been reversed or delayed due to the world economic crisis (United Nations, 2009). The number of people suffering from hunger rose to 1.02 billion in 2009, 129 million children were underweight and 195 million under the age of 5 were stunted (United Nations, 2010). Progress towards achieving universal primary education in developing countries has been noticeable, though there are still more than 10% of children of primary school age that are not enrolled (United Nations, 2010). Under-five mortality declined from 93 per 1,000 live births in 1990 to 67 per 1,000 live births in 2007, which is still short of the goal of reducing child mortality by two thirds between 1990 and 2015 (United Nations, 2009).

A look at the overall performance on these or other indicators neglects large disparities due to income itself, urban-rural differences and other inequalities related to gender, language, ethnicity or disability. Social exclusion and a lack of participation have been diagnosed as the main drivers of group-based disparities and represent a further dimension of poverty (United Nations, 2009, 2010). This is mainly reflected in MDG 3, "Promote gender equality and empower women". However, progress regarding gender equality remains low. In 2007, out of 171 countries only 53 had achieved gender parity in primary and secondary education. The gender gap in secondary schooling is even more severe and particularly evident in Sub-Saharan Africa, where the girls' to boys' enrolment ratio is only 79% in 2007. Moreover, gender gaps persist on the labor market and in the political arena. For example, only 18% of the parliamentary seats were held by women in January 2009 (United Nations, 2009, 2010).

The picture given here shows that there still remains much to be done and makes clear that actions have to be identified in the multiple dimensions of development to accelerate progress towards achieving the MDGs. Such a multidimensional view towards development constitutes the basis of this work. It is inspired by Sen's notion of development as freedom or expansion of capabilities (e.g. Sen, 1999b, 2003). Sen's capability approach is based on the two concepts of functionings and capabilities. Functionings are the "doings and beings" of a person, her actions and the status that she values and enjoys, like being healthy, being educated, achieving self-respect or participating in social life. Capabilities refer to different combinations of functionings a person is able to achieve, covering the notion of freedom to choose the kind of life one would like. With his approach, Sen inspired the emergence of the pluralist and integrative conception of "human development" and its operationalization in the Human Development Index of the United Nations Development Program (UNDP). It is not only income but also health and education among other factors that enable people to live the life they value.

Sen leaves the question of what are valuable achievements and freedoms wide open and does not make explicit a list of fundamental universal capabilities (Nussbaum, 2003; Gaspers and van Staveren, 2003). However, he highlights the importance of public deliberation, participatory processes and political freedoms for social choice and the constitution of values and development goals, as in such a context people are able to advance their own case and act as agents of the development process.

### **Political Institutions and Human Development**

Sen's discussion about valuable capabilities and the formation of these values centers on social interactions and draws attention to institutions in general. North (2001, p. 97) defines institutions as "the humanly devised constraints that structure political, economic and social interaction. They consist of both informal constraints [...] and formal rules [...]." Institutions are the rules of the game. They create order, reduce uncertainty and affect the prosperity of a nation by reducing transaction costs and regulating contract enforcement and property rights protection. A very important feature is the distributional effect of institutions (World Bank, 2005). In particular, institutions distribute power in a society and therefore they affect the capabilities of people to choose between different ways of living. Sen (1999b) emphasizes democratic political institutions that create the environment for social choice and value formation where all people can actively and equally participate in an open deliberation process. Therefore, besides the intrinsic value of democracy, democratic institutions help to produce responsive policies and to hold politicians and bureaucrats accountable. It is the purpose of the **Essay 1** to investigate whether Sen's argument withstands empirical evidence and to answer the question of which political system is the best for obtaining a high level of human development measured with the non-income dimensions education and health.

There are in fact examples that challenge Sen's claim. Present-day Singapore, an autocracy, is a high income country with a high life expectancy at birth of 80 years and a high literacy rate of 94%.<sup>1</sup> The development path of his own country inspired Singapore's former President, Lee Kuan Yew, to put forward the famous Lee hypothesis according to which authoritarian rule is more efficient than democratic government and therefore beneficial to economic development (and thus to human development as well) (Sen, 1999a). Also, relatively poor Cuba has managed to achieve a very high life expectancy rate at birth of 79 years and an adult literacy rate of almost 100%.<sup>2</sup> On the other hand, the democratic country of India, for example, has a life expectancy at birth of only 63 years and a literacy rate of 66%.<sup>3</sup>

Sen (1999a, p. 6) calls this "sporadic empiricism" and this is certainly true. Nevertheless, controversies put forward in the theoretical literature do uphold the question about the power of democracy. First, there is a controversy concerning the contradictory effects of property rights protection and redistribution in democratic societies on growth and well-being (e.g. Mohtadi and Roe, 2003; Alesina and Rodrik, 1994; Baum and Lake, 2003). Secondly, the causal direction is not clear: is democracy a cause or a consequence of the development process (e.g. Lipset, 1959; Persson and Tabellini, 2007a; Glaeser et al., 2007)? Thirdly, there is a debate as to which conditions are necessary for democracy to have a positive effect on human development (e.g. Keefer and Khemani, 2005). Empirical studies do not provide a coherent answer to these questions and they have their limitations (e.g. Lake and Baum, 2001; Baum and Lake, 2003; Navia and Zweifel, 2003; Franco et al., 2004; Besley and Kudamatsu, 2006; Tsai, 2006; Safaei, 2006; Ross, 2006). They are either restricted to only one non-income dimension of human development or to a cross-sectional analysis leaving out developments over time. Furthermore, they do not sufficiently account for possible conditions influencing democracy's performance.

Acknowledging these shortcomings, **Essay 1**, which is based on joint work with Sebastian Vollmer, extends the existing literature in several ways. First, the essay emphasizes the redistributive effects of democracy and complements Sen's theoretical argument using the well-known median voter theory to illustrate why democracy should outperform autocracy with respect to health and education (Meltzer and Richard, 1981). A second contribution

<sup>1</sup>http://hdrstats.undp.org/en/countries/country\_fact\_sheets/cty\_fs\_SGP.html (date of access, May 2010). Reference year is 2007.

<sup>2</sup>See http://hdrstats.undp.org/en/countries/country\_fact\_sheets/cty\_fs\_CUB.html (date of access, May 2010). Reference year is 2007.

<sup>3</sup>See http://hdrstats.undp.org/en/countries/country\_fact\_sheets/cty\_fs\_IND.html (date of access, May 2010). Reference year is 2007.

consists in identifying conditions (income inequality, the level of economic development, education and ethnic fractionalization) that are assumed to affect democracy's performance. Using a panel analysis over a time span of 30 years, the relationship between political institutions, life expectancy at birth and the literacy rate is tested and interaction effects are included to account for factors that affect the functioning of democracy.

The main finding is a robust positive and significant association between democracy and the indicators of human development, even if one controls for factors like the level of economic development. Although causality is difficult to establish, besides its intrinsic value, democracy seems to be instrumental to achieving better health and education. However, the interaction effects between democracy and the presumed conditions of functioning turn out to be insignificant or not robust.

### **The Role of Social Institutions related to Gender Inequality**

As pointed out in North's definition there are different types of institutions that together determine the extent of capability expansion or deprivation. At the level below those institutions that are mainly concerned with property rights protection, redistribution and contract enforcement in a political system, there are informal social institutions (Williamson, 2000). These are often taken for granted, shape people's identity and provide role models that help people to behave appropriately in daily life without putting efficiency at the forefront (Hall and Taylor, 1996; Peters, 2005). Some of these institutions lead to capability deprivation in the form of social exclusion. Sen's capability approach has been criticized for his view of independence, autonomy and individualism, which fails to highlight social relations (e.g. Nussbaum, 2003; Gaspers and van Staveren, 2003). However, he identifies social exclusion as a constitutive part of capability deprivation as well as a cause of capability deprivation in other dimensions (Sen, 2000b).

The implantation of the "right" formal institutions, e.g. democratic ones, to a country does not guarantee the "right" track towards development, as formal institutions interact with informal ones. Development outcomes then depend on the strength of both formal and informal institutions (Williamson, 2009). Relationships are either complementary or substitutive. Although a formal democratic system may open the space for public discussion, deliberation might be at risk because a deeply rooted power structure and elite domination hinder the participation of all citizens (Gaspers and van Staveren, 2003). The relevance of this issue becomes obvious if social exclusion mechanisms in formally democratic countries are considered.<sup>4</sup> For example, informal institutions that back up social exclusion mechanisms might

<sup>4</sup>Gaspers and van Staveren (2003) and Nussbaum (2003) criticize Sen's account because his notion of social justice is underelaborated as it is left to social choice. Moreover, he does not explicitly deal with the problem

hinder the extension of the franchise. Racial discrimination against African Americans in the form of 'informal' violence and intimidation or disenfranchising laws restricted the use of the formal right to vote for black people for a long time until finally in 1965 the Voting Rights Act was passed to counteract at least to some extent those discriminatory practices.5 Another example is Switzerland, where women gained the right to vote in 1971. Including social institutions in the study of development could therefore be a valuable effort.

This is particularly evident if one considers that despite considerable progress in recent decades, gender inequality in the manifold dimensions of well-being remains pervasive in many countries of the world. **Essays 2, 3 and 4** are dedicated to the roots of these inequalities and their heterogeneity across space and time. They center on social institutions related to gender inequality that frame gender-relevant meanings, shape gender roles and become guiding principles in everyday life. Influencing the distribution of power between men and women in the private sphere of the family, in the economic sphere, and in public life, they constrain the opportunities of women and their ability to become agents of development (Sen, 1999b).

**Essay 2**, which is the result of joint work with Boris Branisa and Stephan Klasen, focuses on the measurement of social institutions related to gender inequality. Existing measures are outcome-focused, measuring gender inequality in well-being and agency (Klasen, 2006, 2007), e.g. the Gender-Related Development Index (GDI) and the Gender Empowerment Measure (GEM) (United Nations Development Programme, 1995) or the Global Gender Gap Index from the World Economic Forum (Lopez-Claros and Zahidi, 2005). Other measures like the Women's Social Rights index (WOSOC) of the CIRI Human Rights Data Project<sup>6</sup> could be partially used as a proxy for the institutional basis of gender inequality. However, it also covers outcomes of institutions and, coming from a human rights perspective, it neglects informal institutions and does not differentiate between what happens within the family and what happens in public and social life.

Given this lack of measures, **Essay 2** proposes several composite indices measuring social institutions related to gender inequality that can be used to compare the societal situation of women in over 100 non-OECD countries and allow the identification of problematic countries and dimensions of social institutions that deserve attention by policy makers and need to be scrutinized in detail. These are the Social Institutions and Gender Index (SIGI) as a multidimensional measure of deprivation of women, and its five subindices each measuring

that freedoms have to be curtailed if social justice and, implicitly, equality are pursued. Nussbaum (2003) therefore claims fundamental entitlements that are independent of people's preferences.

<sup>5</sup>See http://www.justice.gov/crt/voting/intro/intro\_b.php, date of access May 2010.

<sup>6</sup>Information is available on the webpage of the project http://ciri.binghamton.edu/ (date of access: April 16, 2010).

one dimension of social institutions related to gender inequality (Family code, Civil liberties, Physical integrity, Son preference and Ownership rights). The one-dimensional subindices are built out of variables of the OECD Gender, Institutions and Development database7 using the method of polychoric PCA to extract the common information of the variables corresponding to a subindex (Kolenikov and Angeles, 2009). The formula of the SIGI is inspired by the Foster-Greer-Thorbecke poverty measures (Foster et al., 1984), which offers a reasonable way to capture the multidimensional deprivation of women caused by social institutions. It has the advantage of penalizing high inequality in each dimension and of allowing for only partial compensation between dimensions.

It is widely accepted that gender inequalities not only harm the affected women but come at a cost for the whole society, leading to ill-health, low human capital, bad governance and lower economic growth (e.g. World Bank, 2001; Klasen, 2002). Due to the scarcity of cross-country level data only a few studies investigate the development impact of genderrelevant social institutions (e.g. Morrisson and Jütting, 2005; Jütting et al., 2008). Applying the newly proposed social institutions indicators, **Essay 3**, which is based on joint work with Boris Branisa and Stephan Klasen, investigates at the cross-country level their explanatory value for development outcomes (female secondary schooling, fertility rates, child mortality and governance in the form of rule of law and voice and accountability). Based on bargaining household models (e.g. Manser and Brown, 1980; McElroy and Horney, 1981; Lundberg and Pollak, 1993), models considering the costs and returns of children (e.g. Becker, 1981; King and Hill, 1993; Hill and King, 1995) as well as contributions from several disciplines on governance and democracy, we derive hypotheses on the impact of social institutions related to gender inequality. The findings from the regression analysis show that social institutions matter even if one controls for religion, political system, geography and the level of economic development; higher inequality in social institutions is associated with worse development outcomes not only for the affected women but also the whole society.

**Essay 4**, which was produced in collaboration with Boris Branisa, elaborates more on the linkage between social institutions related to gender inequality and governance, contributing to a separate branch of research on gender and corruption. Former research efforts showed that there is a negative statistical association between representation of women in political and economic life and corruption in a society (Swamy et al., 2001; Dollar et al., 2001). Some explanations trace this back to differences in behavior between men and women, some take a historical perspective stating that women are newcomers to the system and therefore behave less corruptly than men (Goetz, 2007) and others mention the possible omitted variable

<sup>7</sup>See Morrisson and Jütting (2005); Jütting et al. (2008)

#### INTRODUCTION 7

of liberal democracy which might affect both the level of representation and corruption in a society (Sung, 2003). Swamy et al. (2001) proposed another omitted variable, "the level of discrimination against women", which we try to capture using the subindex Civil liberties. The findings of a cross-sectional regression analysis controlling for democracy and representation of women in politics and economic life suggest that corruption is higher in countries where social institutions deprive women of their freedom to participate in social and economic life. In such contexts it might therefore not be sufficient to push democratic reforms and to increase the participation of women in order to reduce corruption.

### **Indigenous Origin and Health Inequality in Bolivia**

Recognizing the pervasiveness of gender inequality in the world, MDG 3 is dedicated exclusively to the situation of women. With respect to other groups, the MDGs are less clear. However, background documents and global initiatives draw attention to indigenous people as they are overrepresented among the world's poor at about 15% and suffer more from marginalization, poverty and problems in health and education than the non-indigenous population (Hall et al., 2006; Stephens et al., 2006; United Nations, 2010). As a response to these problems the General Assembly of the United Nations proclaimed the Second International Decade of the World's Indigenous People, which started in 2005.

**Essay 5**, a result of a joint project with Elena Gross, focuses on the situation of indigenous people in Bolivia and demonstrates that ethnic origin is a decisive factor for child health and reaching MDG 4, "Reduce child mortality". From a first point of view, it seems that this is a settled fact and a further study seems to be unnecessary. However, most of the studies stating differences between indigenous and non-indigenous people are based on descriptive and bivariate evidence (e.g. UDAPE and OPS, 2004; Pozo et al., 2006; PAHO, 2007). Although social exclusion and institutional mechanisms are relevant factors for racial differences in well-being, this view might not be sufficient to design policy interventions. It falls short of considering other factors which might be related to both ethnic origin and health, like poverty, urban-rural differences, geographical location and other household related characteristics linkages that can be observed for Bolivia. Even if multivariate analyses are conducted, there are shortcomings (e.g. Larrea and Freire, 2002; Morales et al., 2004; Mayer-Foulkes and Larrea, 2005). The first is to neglect the heterogeneity of health inequality over different health outcomes. The second is related to the usage of the indigenous dummy, which masks heterogeneity within the group of native people - if one bears in mind that there are over 30 distinct indigenous groups living in Bolivia. Our study investigates several indicators on childhood diseases and vaccinations, taking the former shortcomings into consideration. The main lesson is that ethnic origin matters. However, one should go beyond indigenous origin, Quechua, Aymara, etc. and look for factors that capture particularly characteristics of the mother like health knowledge of the mother or mother's education that might be related to the heterogeneity in health outcomes over different ethnic groups. A hypothesis, which arises from Essay 5 and would need further investigation is that these characteristics of the mother might be intermittent variables between ethnic origin and health outcomes. However, this should be investigated additionally to putting efforts into analyzing institutional mechanisms that might lead to deprivation of these groups.

To summarize: the five essays of this dissertation contribute to the understanding of the linkages between institutions, inequality and development and emphasize the role of groupbased disparities related to gender and ethnicity within this triangle. They confirm the fact that institutions matter and that they influence not only the level of development but also inequality in development outcomes. The essays also show that talking about institutions in general is less useful if policy implications should be drawn. Instead one should distinguish between political and social institutions and differentiate within these types of institutions. Moreover, this dissertation contributes to a discussion about the mechanisms that relate different types of institutions with development outcomes and it highlights factors, which might influence the functioning of these mechanisms by interacting with institutions in the production of development outcomes or which might be intermittent. Concerning democracy no robust pattern about interacting factors in the production of development outcomes has been found. With regards to social institutions a first step towards identifying possible mechanisms is taken and relationships are investigated. Furthermore, learning processes or policies, which change incentive structures are considered as possibilities to change these institutions. Concerning differences in health outcomes across ethnic groups in Bolivia it can be argued that these differences are due to latent institutions that distribute power across ethnic lines. However, it is shown that variables like mother's education or health knowledge let the effect of ethnic origin vanish and further investigations could focus on their role as intermittent variables having the potential to counteract the effect of institutions.

# **Chapter 1**

# **Political Institutions and Human Development: Does Democracy Fulfill its 'Constructive' and 'Instrumental' Role?**<sup>1</sup>

## **1.1 Introduction**

Since Sen (1988, 1991, 1999b,a, 2003), we have been aware of the fact that development is a very encompassing and broad concept. Development can be seen as enhancing each individual's capabilities, which define the freedoms to choose the kind of life they value in accordance with individual preferences. This approach has inspired the emergence of a pluralist and integrative conception of 'human development' and its operationalization in the form of UNDP's Human Development Index. It is not only income, but also health and education and other dimensions that enable people to shape their lives in line with their desires. The aim of this paper is to discuss the contribution political institutions might make to enhancing non-income human development measured in terms of education or health. We choose education and health as both aspects are direct determinants of capabilities and both influence the freedom to choose the kind of life one wants. Education as well as health raises productivity and the ability to convert income and resources into the favored way of life (Sen, 2003). The third dimension of human development, namely income, is not of interest for this paper, since a detailed literature on the relation between democracy and economic development is already available.

Political institutions are a critical area of research as they organize social, economic and political life. Hence, an obvious question is what kinds of institutions do this job best. From

<sup>1</sup>joint work with Sebastian Vollmer

a perspective of freedom, democracy has the advantage that its beneficiaries are free to take decisions about their lives and play a part in shaping societal decisions. Therefore, democracy is also considered as an end of the development process and a piece of the puzzle of the more comprehensive picture of human development (Sen, 1999b,a, 2000a). But whether democracy indeed has a positive impact on economic and human development is not a trivial question - either from a theoretical or from an empirical perspective. With regard to theory, three major debates are centered on the instrumental value of democracy for economic development:

First, there seems to be a controversy concerning the contradictory effects of growthenhancing property-rights protection and equalizing, market-correcting redistribution in democratic societies on growth and well-being (e.g. Mohtadi and Roe, 2003; Tavares and Wacziarg, 2001; Alesina and Rodrik, 1994; Baum and Lake, 2003). Positions that emphasize the deficiencies of democratic systems may support the Lee Hypothesis, named after the former President of Singapore, Lee Kuan Yew, which states that autocratic regimes are more efficient systems to tackle market failures, to stimulate economic growth and as a consequence to improve human development (Sen, 1999a).

A second debate revolves around causation: is democracy a cause or a consequence of the development process? Taking a historical perspective, this debate was initiated by Lipset (1959) who emphasized the modernization process including progress in education, industrialization and urban development as driving forces of democracy. Other examples of these enhancing or impeding forces are income inequality and country-specific and historical characteristics. Several authors have dedicated their work to identifying these factors and/or to filtering out the effect of democracy or democratization on development by controlling for these factors (Barro, 1999; Bourguignon and Verdier, 2000; Persson and Tabellini, 2007b; Glaeser et al., 2007; Papaioannou and Siourounis, 2008; Acemoglu et al., 2008).

Third, in addition to the historical perspective, one could also take a more contemporaneous view as there might also be factors that shape the functioning of a democratic system. It is still not obvious what the conditions are under which democracies function well and for sure there is an overlap with the factors that make democratization work. For example, Keefer and Khemani (2005) and Besley and Burgess (2002) highlight information of voters and social fragmentation, Collier (2001), Mauro (1995), Alesina et al. (1999), Miguel and Gugerty (2005) and others draw attention to ethnic fractionalization, which could disturb the provision of public goods and foster corruption, others like Keefer (2005) focus on the age of democracy.

Empirical research studies give no clear answer concerning the effect of democracy on growth. Minier (1998) and Papaioannou and Siourounis (2008) show that the efficiency ar-

gument in favor of autocratic regimes does not withstand empirical investigations. They find a positive effect of democracy on economic growth. Others, on the contrary, find a moderately negative (Tavares and Wacziarg, 2001), nonlinear or heterogeneous relationship between democracy and growth, assumed to be due to, e.g., the maturity of a democratic system, rent-seeking, or the details of democratic reforms (Barro, 1996; Persson and Tabellini, 2006, 2007b).

When studies focus on the effect of democracy on redistribution, operationalized as the provision of public goods, the size of the public sector and income inequality, the results are less ambiguous (Alesina and Rodrik, 1994; Boix, 2001; Lake and Baum, 2001; Besley and Burgess, 2002; Gradstein and Milanovic, 2004; Stasavage, 2005; Persson et al., 2000). In general, they support the view that redistribution might be higher under a democratic regime, without clearly answering the question whether this redistribution is beneficial to economic and non-income human development.

Concerning the non-income dimensions of human development, there is again uncertainty about the effects of democracy. There are only a few studies empirically investigating the links between political systems and measures for the non-income dimensions of human development. Whereas some find a positive relationship between democracy and human development measured in terms of health and education (Lake and Baum, 2001; Baum and Lake, 2003; Navia and Zweifel, 2003; Franco et al., 2004; Besley and Kudamatsu, 2006; Tsai, 2006; Safaei, 2006), others find less evidence for this influence (Ross, 2006). These research efforts are either confined to only one of the non-income dimensions of human development (Navia and Zweifel, 2003; Franco et al., 2004; Besley and Kudamatsu, 2006; Ross, 2006; Safaei, 2006) or to a cross-sectional focus leaving out developments over time (Franco et al., 2004; Tsai, 2006). Moreover, these investigations, while having in mind potential conditions influencing democracy's performance, include these requisites only as simple controls in their regression models and not in interaction with some institutional measure. Exceptions are for example the studies of Boix (2001) and Baum and Lake (2003) who build interaction terms between democracy and different levels of GDP or the Gini index to capture the distinct effects of democracy in countries with different levels of income and inequality.

Acknowledging the shortcomings of the literature, in this paper we want to extend the latter strand of research in the following ways: we want to answer the questions of whether political institutions are related to the living standard of the population and whether our empirical data support the view that democracy, besides its intrinsic importance for the development process, fulfills a constructive and instrumental role by giving people the opportunity to express, to form and aggregate their preferences and thus to steer public action in an efficient and effective manner (Sen, 1999b). To provide an answer we complement the arguments provided by Sen with theoretical implications of the median voter theory, which is an innovative way to think about the quality and quantity of redistribution and public service provision in political regimes. A second contribution to the literature consists in theoretically identifying and empirically testing conditions under which democracies will display a positive effect given they are supposed to have one - on the provision of public goods and services that are assumed to foster human development. Consequently, we are not interested in explaining democratization but in investigating the potential dependence of democracy's performance upon other factors once it is in place. We empirically test the relationship between political institutions and the levels of education and health, using these indicators as proxies for non-income human development. The empirical investigation is based on a panel data set including all countries for which information is available, which allows us to consider the time dimension in our analysis. A last contribution consists in empirically estimating interaction effects between the conditions of democracy's performance and a democracy variable.

In section 2, we review theories of political institutions, democracy and human development. In section 3, we examine the empirical evidence for this relationship. In section 4, we conclude. Our results indicate that democracy is favorable for human development even after controlling for the level of economic development. But contrary to the theoretical reasoning, there is no clear evidence for the factors that according to the literature are supposed to influence democracy's performance. It seems to be democracy itself - rather independent from the circumstances - that has a positive effect on human development. It is in particular remarkable that democracy's performance seems not to depend on a certain level of economic development.

## **1.2 The Political Economy of Democracy and Human Development**

### **1.2.1 How Can Political Institutions Influence Human Development?**

With regard to a definition and the resulting operationalization of institutions, the existing literature leaves the impression that there is not enough precision concerning the term "institution" itself. There is a heavy use of performance indicators measuring the extent to which certain institutional systems function, e.g. when it comes to political stability or governance issues (Gradstein and Milanovic, 2004).2 Such performance indicators are then often mixed up with public policies. However, both measured performance and policies are the output of underlying structures and procedures as well as contextual factors. These underlying (for-

<sup>2</sup>See for example the Worldwide Governance Indicators of Kaufmann et al. (2007).

mal) structures and procedures can be subsumed under the heading "political system". This is what we understand by political institutions.

According to the rational choice strand of the new institutionalism in political science or the field of new institutional economics and political economy, political institutions shape the rules which govern the political game (Hall and Taylor, 1996; Persson and Tabellini, 2000; Peters, 2005). They do not only determine, via electoral rules, the actors and preferences which can access the political arena and get heard. They also provide the means to aggregate those preferences by establishing procedures for decision-making and distributing political power (Persson, 2002). The common output of institutions and preferences is policies. Although actors and other environmental constellations may change over time, policies in general will reflect the political institutions that produced them (Peters, 2005; Persson and Tabellini, 2006). Two types of policies may be favorable to human development: policies for the protection of property rights and policies for redistribution.

Policies for the protection of property rights contribute to economic development and economic growth (Acemoglu et al., 2002). Growth increases the welfare of the population by reducing poverty at least in the longer term (Dollar and Kraay, 2002; Klasen, 2004; Kraay, 2006). Therefore, property-rights protection is a necessary condition for an increase in the overall wealth of a nation (Acemoglu et al., 2001, 2002). But whether all members of this nation can benefit from it highly depends on redistribution as well. Policies for redistribution equalize the distribution of income and welfare in a society. A trade-off between the two types of policies might occur as on the one hand property-rights protection enhances development by securing investments into physical capital but is not concerned about distributional aspects of the costs and benefits. On the other hand, redistribution fosters human capital and lowers income inequality, but might hinder investments into physical capital and disturb incentives on the labor market and moreover, it might lead to rent-seeking activities (e.g. Mohtadi and Roe, 2003; Tavares and Wacziarg, 2001; Alesina and Rodrik, 1994; Baum and Lake, 2003).

Despite the potential negative effects of policies of redistribution, we consider them as essential to achieve progress in non-income indicators of human development like health and education. This type of policy comprises broad-based programs and covers the provision of public goods and services. These policies aim at compensating for market failures and at achieving normative, social optima. Especially the poor are given access to goods and services which are not sufficiently provided by markets. The matching of society's and an individual's needs with an adequate redistribution scheme and an appropriate public provision of goods and services provides a more direct link between political institutions and human development than property-rights protection. Following this line of argumentation the following question arises: what is a political system that is appropriate to produce market-correcting

redistributive policies that are designed to match the needs of society and have the potential to advance non-income human development? The answer is democracy.3 Democracy is conceived as a political system whose structures and procedures permit the rule of the people. Of importance are free and repeated elections and political competition, the rule of law, and political and civil liberties. These component parts frame public debate and deliberation that deal with the management of society.

Although redistribution from the rich to the poor and vice versa exists in both autocratic and democratic systems, the following theoretical arguments suggest that redistribution from the rich to the poor is more pronounced in democracies.4 One of the best-known theoretical arguments is the model of Meltzer and Richard (1981). The median voter hypothesis states that in democratic governments the median voter is the decisive voter. The more her income falls short of the average income of all voters, the higher the tax rate, i.e. redistribution, she will decide. Therefore, government spending should be larger and social services more extensive in democratic regimes - if the majority of the voting public lives at the bottom of the income distribution and only a small part enjoy richness (Keefer and Khemani, 2005). In contrast, in authoritarian systems, the distribution of wealth does not play a decisive role. All or a substantial part of the electorate is excluded from the decision-making process, and this is precisely to avoid the redistributive consequences of democracy. As a result, the average size of the public sector and public spending remains quite small (Boix, 2001).

Another line of argumentation brought forward by Lake and Baum (2001) emphasizes the state's monopoly to use force legitimately in producing public services that mitigate market failures. In a democratic regime, these services are provided in larger quantity and at lower prices, as barriers for political competitors and costs for political participation are low compared to autocracies. In autocracies the emphasis will be more on earning rents than on providing public services, assuming that earning rents is a function of the provision of public services and restricting the supply of services will increase rents. This argumentation also supports the hypothesis that democracies will provide higher levels of public services to their citizens.

However, quantity does not imply quality. In other words, voting alone does not solve the aggregation problem resulting from different individual preferences. Thus, a second question related to the qualitative dimension of redistribution emerges: why are democratic governments more responsive to the needs of the citizenry compared to autocratic ones? According

<sup>3</sup>Democracies are considered to perform best on both dimensions: property-rights protection and redistribution. Whether the one or the other is more important depends on people's preferences and the formal and informal face of the considered democracy.

<sup>4</sup>See for example Gradstein and Milanovic (2004) for an empirical study finding evidence for this linkage.

to (Sen, 1999b,a), democracy - beyond its "intrinsic" value - is of eminent importance for the process of development because of the "constructive" and "instrumental" role it plays in the formation and aggregation of values, needs and preferences and their translation into well-designed policies benefiting the society. Being constituent features of a democratic system, political and civil liberties, for example those related to free speech, public debate and criticism, permit the formation of preferences and values as well as access to the relevant information so that societal needs are visible. Democratic procedures facilitate the transmission of these needs into the political arena where decision-making power is distributed amongst legitimate representatives of the society as a whole.5

Democracy does not only help to construct policies that are matched to the needs of its citizens, but is also instrumental and protective. Control mechanisms such as free and repeated competitive elections and the compliance with the rule of law principle reduce discretionary and corrupt behavior of representatives who hold political power. Thus, democracy provides the incentives to create responsibility and accountability that induce political-administrative leaders to listen and to act on behalf of the society they represent (Sen, 1999b,a).

In an autocratic regime a usually small ruling elite dictates "the will of the people" from above. This is frequently accompanied by a repression of the political opposition and the prohibition of free expression and opinion, thereby impeding the conceptualization of the volonté générale. The state apparatus is (mis-)used in favor of the welfare of the ruling elite. Political measures with a redistributing character that increase the welfare of the bottom quantiles of society are implemented not because of institutional structures but for ideological reasons and only to a level that will help autocrats to remain in power and to increase their own wealth (Olson, 1993; McGuire and Olson, 1996). Responsiveness, representation, accountability and the selection of competent political and administrative staff thus are uncommon in autocratic regimes (Besley and Kudamatsu, 2006).

Summarizing, democracies quantitatively and qualitatively outperform autocracies with respect to redistribution. There is no clear relation between inequality and societal needs on the one hand and redistribution on the other hand in autocracies, except for those, generally socialist ones, with a special commitment to universal welfare. In general, this leads to a lower level of human development in autocratic systems.

<sup>5</sup>The latter means that otherwise disadvantaged groups, whether they are minorities or a broad mass of poor people in a developing country, get a voice and the opportunity to be heard and represented. In cases of direct democracy or democracy at a local level, these groups even decide for themselves.

### **1.2.2 What Determines Public Service Provision in Democracies?**

The formal existence of democracy does not guarantee that it functions in the idealized manner described above. Democratic regimes display a lot of heterogeneity regarding human development outcomes. This is due to factors that determine whether the relationships predicted by the median voter theory or Sen's theory work or not. These factors then hamper or foster the performance of democracy with regard to the satisfaction of societal needs. Problems could arise if for certain reasons - located either at the agenda setting, the policy formulation, the implementation or the evaluation phase - the allocation of public expenditures is inefficient.<sup>6</sup>

Our approach to explain heterogeneity in any democracy's performance follows that of Keefer and Khemani (2005) and hence differs from other studies that focus more on the preconditions for democracy and democratization (e.g. Lipset, 1959; Glaeser et al., 2007). We do not consider the question whether a country has to be prepared for democracy or whether it is democracy that lifts the country up to a certain level of development.7 Following our theoretical reasoning, the necessary timing of the presence of the respective factors is treated here as simultaneous. Their interaction with democracy at one point in time influences the output, the policies in the form of public goods provision, and the outcome, the level of human development.

First, as redistribution and the provision of public goods depend on whether there is anything to redistribute and to invest in public goods, the performance of a democratic system will be better the higher the level of economic development is (Boix, 2001; Baum and Lake, 2003).<sup>8</sup> So the positive effect of democracies on public goods provision will be intensified by the level of economic development.

Secondly, if citizens are ill-informed, this may lead to insufficient participation, which would be necessary for public reasoning and the expression of 'qualified' needs. As a result, the quality of responsive government manifesting itself in policies that reflect society's demands and needs decreases. Moreover, accountability suffers from information constraints because voters cannot control politicians' behavior. Education is one of the important factors as it has a potential to alleviate information problems.<sup>9</sup> Education in this context is not taken

<sup>6</sup>Because poor people are highly dependent on public action as they cannot invest their own (nonexistent) private resources, they suffer the most from ineffective government in terms of redistribution and service provision (Keefer and Khemani, 2005).

<sup>7</sup>Hence, we follow the statement of Sen (1999a, p. 4): "A country does not have to be deemed fit for democracy; rather, it has to become fit through democracy."

<sup>8</sup>On the effect of income on health there is a literature concerned with the absolute income hypothesis that states that income affects individual health but at a diminishing rate (e.g. Karlsson et al., 2010).

<sup>9</sup>Other factors might be a well developed media sector and accountable and institutionalized parties that take over political education tasks (see Keefer and Khemani, 2005). But it can easily be argued that without a certain

as an intrinsic component of human development that we want to explain, but as a means to human development. It is not only in itself a precondition for a higher living standard because it positively affects earnings, health and so on. It is also found to be a requirement for democracies to develop and to persist as it leads to conscientious participation that may be related to an efficient and effective provision of public goods (Lipset, 1959; Glaeser et al., 2007; Keefer and Khemani, 2005).<sup>10</sup>

Social fragmentation can be another factor disturbing the functioning of a democratic system measured by the public goods it provides. Research has found that social fragmentation or, to be more concise, ethnic diversity leads to collective action problems, increased patronage as well as clientelism and in the end to an under-provision of public goods (Alesina et al., 1999; Alesina and Ferrara, 2005; Miguel and Gugerty, 2005). Within democratic systems, social fragmentation may pose problems because mechanisms which would hold the government accountable and responsible are undermined. In socially heterogeneous settings, governments are rewarded on the basis of identity and not on their performance (Keefer and Khemani, 2005). Moreover, social fragmentation leads to political fragmentation, which from a certain threshold value can result in increasing co-operation problems (Collier, 2001).

The last factor that is in line with the quantity-redistribution argument is income inequality, characterized by a distribution of income where the median income is smaller than the average income.<sup>11</sup> The following argumentation supports the idea that in a high incomeinequality context democratic systems might provide more health and education services due to stronger redistributive pressures by the median voter. As a starting point it is necessary to understand how income inequality affects health and education. Income inequality reduces human development because, in more unequal societies, fewer people can afford to live a healthy life and to spend their money on education. Moreover, income inequality may lead to stress and frustration harming health, and according to e.g. Wilkinson (1992), Kawachi et al.

level of broad-based education, a media sector will not develop because of a lack of demand (for the role of the media see Besley and Burgess (2002). The same is supposed to hold for the institutionalization of parties and accountability issues.

<sup>10</sup>We leave out cultural factors as they are hard to measure. Inglehart and Welzel (2005) emphasize the people's values as being as important as socioeconomic resources and civil and political rights. According to these authors, culture provides the link between economic development and democratic freedom. Without certain values like "human autonomy" or "self-expression values", fostering the priority of self-made choices, human development might not be possible. Such values are dependent upon a certain level of socioeconomic development that we might proxy by taking the level of economic development into account. Moreover, we assume - although this is to be questioned - that the more education people have the more enlightened they are and the more freedom they demand to live the life they value.

<sup>11</sup>The argument that the median voter is farther away from the mean when a society is more unequal is true for right-skewed distributions. This is usually the case for national income distributions, which are quite close to log-normal distributions.

(1997) and Karlsson et al. (2010) it is found that income inequality leads to a higher mortality as social cohesion breaks down. Income inequality leads to a residential concentration of the poor and the rich that gives rise to a segregation hindering social cohesion. Poor people living in a poor neighborhood not only have to get along with a lack of income but also with a worse infrastructure related to e.g. schooling or health so that their situation cannot improve, whereas the rich invest in their neighborhood, in particular in human capital, health care and other factors. As the share of poor people rises and segregation aggravates, the levels of health and education in a society become worse. In segregated and polarized societies the provision of public goods worsens. Moreover, income inequality spurs crime and violence, affecting health directly.<sup>12</sup> If redistributive pressures increase, meaning that the distance of the median voter's income from the average income becomes larger, then in democratic systems according to the median voter theory more redistribution will be demanded (Meltzer and Richard, 1981). Whether the redistribution is in the form of education and health services or income transfers is open, as high income inequality does not necessarily imply high inequality in education and health (Grimm et al., 2008). In such a context the median voter could be healthier than the average and more literate as well. Therefore, it is not obvious why the median voter should demand more health or education services. However, redistribution of any kind is expected to compensate for the negative effects of income inequality.13 Autocratic regimes lack such a mechanism. Moreover, democratic regimes foster the rise of civil society organizations that preserve social cohesion and capital and take over tasks that are insufficiently fulfilled by the state (Safaei, 2006).

### **1.2.3 Summary and Working Hypotheses**

Summarizing the theoretical arguments above, we can state that democratic regimes in comparison to autocratic ones are expected to produce a higher rate of redistribution and thus lead to higher public expenditures. Public spending priorities in democracies reflect the needs of the society more than those in autocracies, and democratic control mechanisms will assure the implementation of policies so that a high degree of compliance with laws, directives and orders is reached. Hence, public action can translate into the desired human development outcomes, for example a better health status of the population or a lower illiteracy rate. But the performance of democracies will vary according to the specific circumstances. We assume that the level of income, education, social fragmentation and the level of income inequality

<sup>12</sup>The mechanisms which reflect how income inequality might affect health are subsumed under the income inequality hypothesis which states that income inequality in a society affects the health of every member of the society (e.g. Karlsson et al., 2010).

<sup>13</sup>Boix (2001) states that political participation is an important condition for inequality to be translated into redistributive pressures in a democratic regime.

all affect the level of the provision of public goods and human development in a democratic system. Therefore, the following general hypotheses can be derived:


## **1.3 Empirical Links Between Democracy and Human Development**

### **1.3.1 Empirical Implementation**

To quantify human development, we focus on the non-income components of UNDP's Human Development Index and consequently use UNDP's data on life expectancy at birth and on literacy rates. Life expectancy at birth is measured in years, whereas the literacy rate is an index value ranging from 0 to 100. We choose education and health as both aspects are direct determinants of capabilities and as they both influence the freedom to choose the kind of life one likes. Education as well as health raises productivity and the ability to convert income and resources into the favored way of life (Sen, 2003). The third dimension of human development, namely income, is not of interest for this paper, since a detailed literature on the relation between democracy and economic development is already available. Our data on political institutions is taken from the Polity IV Project of the Center for Systemic Peace at George Mason University (Marshall and Jaggers, 2005). We use the Polity2 score as our *Democracy* variable ranging from 10 (highly democratic) to -10 (highly autocratic), while a zero score indicates a state between autocracy and democracy which we consider as not being democratic.<sup>14</sup> There are systems scoring around the zero point that yield traits of both autocratic and democratic systems and are therefore transitory regimes, but to facilitate the further examination we classify those regimes having a score above zero as democratic and the other ones as autocratic.

Following Besley and Kudamatsu (2006), we take the fraction of democratic years over the past five years as our measure for democracy (*Demexp*) to capture democratic experience. As an alternative measure for democracy we calculate the average Polity2 score over the past five years (*Mpol*) that allows to consider the quality of a democratic or autocratic system. While *Demexp* does not mask transitions from democracy to autocracy and vice versa like *Mpol*, it classifies all countries having a score of -10 or 0 as autocratic systems and all others as democratic hiding differences between them. Here, *Mpol* allows to differentiate within the groups of democratic and autocratic countries.

The consideration of a period of five years has the advantage to obtain a more stable value for the democracy measure used. Another reason for the five year period is that the values of life expectancy and literacy are not updated annually but roughly every five years. Nevertheless, one might argue that it is certainly arbitrary to take five years and not ten, but with this choice, we are in line with the existing literature (e.g. Besley and Kudamatsu, 2006) and our study is therefore comparable. Having different democracy measures is rather important as a check of robustness.

Other variables we expect to have an impact on human development or that describe possible conditions under which democracy affects human development are the following: *GDP* per capita PPP in constant prices<sup>15</sup> from the Penn World Tables 6.2; *Gini* coefficients from the WIDER dataset with improvements in terms of comparability across countries and time by Grün and Klasen (2008)16; a measure of ethnic fractionalization (*Fractional.*) <sup>17</sup> as

<sup>14</sup>According to the Polity2 measure, a system can be classified as democratic if three interdependent elements exist: 1) competitiveness of participation, institutions and procedures allow citizens to express their political preferences; 2) openness and competitiveness of executive recruitment and constraints on the chief executive, so that the executive power is institutionally constraint; 3) civil liberties. The last element as well as rule of law, system of checks and balances, freedom of the press etc. is not coded in the index as the latter are performance indicators of democratic regimes. Autocracies are defined vice versa. For more details see Marshall and Jaggers (2005: 13f.).

<sup>15</sup>US\$, base year: 2000.

<sup>16</sup>Gini coefficients are not available for every year. We therefore use a simple moving average between available observations to complete the dataset. The reference category for the Gini coefficients is gross income per capita.

<sup>17</sup>The ethnic fractionalization measure renders the probability that two individuals selected at random from a population are members of different groups. It is calculated with data on language and origin using the following formula *FRACj* = 1 − ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *s*<sup>2</sup> *i j*, where *si j* is the proportion of group *i* = 1,...,*N* in country *j* going from complete homogeneity (an index of 0) to complete heterogeneity (an index of 1). For more details see Alesina et al. (2003).

proxy for social fragmentation from Alesina et al. (2003) which is constant over time.18 Since education is a factor assumed to influence the performance of democracy, literacy rates are also used as an explanatory variable in our panel analysis for life expectancy but are neglected in the analysis of literacy itself. As our additional control variables we consider as important whether a country experienced some conflict in the period under observation and whether a high percentage of population is suffering from HIV/AIDS. To measure war, we take data from the UCDP/PRIO intrastate conflict onset dataset, 1946-2006. We choose the variable warinci2 (*War*) that measures the incidence of intrastate war and is coded 1 in all country years with at least one active war.<sup>19</sup> For HIV/AIDS, we take adult (15-49 years) HIV prevalence rates (*Aids*) from the 2008 Report on the global AIDS epidemic from UNAIDS/WHO. Data coverage over time and countries lead us to the decision to create a variable that takes the value 1 when a country has a prevalence rate over 5 per cent in the year 2003. To take the heterogeneity between autocracies into account we introduce a simple *Socialism* dummy to represent autocracies with a commitment to universal welfare (Safaei, 2006). The dummy takes the value one for all Eastern European countries until 1990, Vietnam until 1980, China until 1975, and for Cuba and North Korea until today.

We suspect that democracy causes different priorities in public expenditures compared to autocracies. Therefore, increases in public expenditures on health and education can be decomposed into two components: an increase due to higher total expenditures and an increase due to different priorities in government spending. While the first source is mainly driven by economic growth, we expect democracy to be a main driver of the second source. We were unable to gather sound data for government spending for the given period. Such data would have enriched our analysis as we could have examined the channels that democracy takes to affect human development. The available data on public expenditures in health and education were not adequate for our analysis. Only for the more recent years does the Government Finance Statistics of the IMF include sufficient information concerning these issues. Thus, neither the public expenditures' path of causation nor the channel of private spending can be investigated here due to data restrictions. We must therefore rely on the theoretical argumentation that underpins our empirical analysis.

<sup>18</sup>According to Alesina et al. (2003) the assumption of stable group shares is not a problem as examples of changes in ethnic fractionalization are rare. At least over the time-horizon of 20 to 30 years, time persistence can be assumed.

<sup>19</sup>War is defined by more than 1000 battle deaths. As intrastate wars are more frequent than interstate wars, we decided to take the intrastate war variable.

### **1.3.2 Descriptive Statistics**

In 1970, we have 44 democratic countries and 99 autocracies in our dataset. In 2000, these were 97 democracies and 58 autocracies. In the data on which we base our estimations and for the time span 1966 to 2000 that is used for calculating our proxy measures for democracy, there are 32 countries with a a polity2 score larger than 0 and 35 countries with a polity2 score smaller or equal to zero.20 If the whole time span from 1966 to 2000 is considered and all countries are included that have no missing on the Polity2 score, in 66 cases there was a transition from a positive polity2 score to a zero or negative one and in 166 cases a transition from a negative or zero polity2 score to a positive one indicating a transition to democracy.

Average life expectancy was 57.39 years and an average of 62.77% of the adult population were literate in the year 1970. In the year 2000 life expectancy had increased to 64.75 years and literacy rates went up to 80.44%. In 1970, life expectancy in democratic countries was 60.6 years compared to 55 years in autocratic countries. Until 2000, the gap between democratic and autocratic countries widened as people in democracies had an average life expectancy of 67.85 and in autocracies only 58.65 years of age. Literacy rates give a similar picture with 73.3% literate persons in democratic countries compared to 54.48% in autocracies in 1970, and 85.12% literate in democracies in the year 2000 compared to 69.65% in autocratic systems.

Besides looking at simple averages it is worthwhile to take a look at the densities of life expectancy and literacy for democracies and autocracies separately (Figures 1.1 and 1.2). We use kernel density estimators for this purpose and apply boundary corrections at 0 and 100 for the literacy rate and at the minimum and maximum values for life expectancy. While in democracies, both for life expectancy and literacy the mass of the distribution tends to the right hand side, there seems to be a group of autocracies with a low level and another one with a high level of life expectancy and literacy each.

The same pattern can be observed in Tables 5.7 to 5.14 in Appendix 1 where we classified countries according to three categories: low, middle and high income; autocracy and

<sup>20</sup>The countries with a polity2 score larger than 0 are: Australia, Austria, Belgium, Botswana, Canada, Colombia, Costa Rica, Denmark, Finland, France, Germany, India, Ireland, Israel, Italy, Jamaica, Japan, Malaysia, Mauritius, Namibia, Netherlands, New Zealand, Norway, Papua New Guinea, South Africa, Sri Lanka, Sweden, Switzerland, Trinidad and Tobago, United Kingdom, United States, Venezuela. The countries with a polity2 score smaller and equal to 0 are: Afghanistan, Algeria, Bhutan, Burundi, Cameroon, Chad, China, Congo, Dem. Rep., Cuba, Egypt, Gabon, Guinea, Iraq, Jordan, Kenya, Korea, Dem. Rep., Lao PDR, Liberia, Libya, Mauritania, Morocco, Myanmar, Oman, Rwanda, Saudi Arabia, Singapore, Syrian Arab Republic, Togo, Tunisia, Yemen as well as Tajikistan, Turkmenistan, Uzbekistan, Kyrgyz Republic, and Kazakhstan taking the former status of the Russian Federation into account.

Figure 1.1: Cross-Country Distribution of Life expectancy at Birth

Solid line: Kernel density estimator for countries being democratic in the given year. Dashed line: Kernel density estimator for countries being autocratic in the given year. 1970: 41 democracies and 97 autocracies; 1980: 44 democracies and 107 autocracies; 1990: 67 democracies and 85 autocracies; 2000: 97 democracies and 58 autocracies.

democracy; low, middle and high life expectancy or literacy rates.<sup>21</sup> On average, we observe that democracies have a higher life expectancy and a higher literacy rate than autocracies. Exceptions are democracies with low life expectancies, mainly due to the HIV/AIDS tragedy in big parts of Sub-Saharan Africa. Considering the rich group of autocracies especially in 2000, it is striking that virtually all of them are oil states. This indicates, at least to some extent, that autocracies have problems catching up with the top of the income distribution, as

<sup>21</sup>To define the groups of low, middle and high life expectancy or literacy rates we computed quantiles of life expectancy and literacy. The income groups are defined according to Holzmann et al. (2008).

long as they do not control a large amount of such an important resource as oil. But what is more important for our study is the fact that although these countries show a high level of income, whether caused by natural resources or not, they display lower life expectancies and lower literacy rates than their democratic counterparts.

Figure 1.2: Cross-Country Distribution of Adult Literacy Rates

Solid line: Kernel density estimator for countries being democratic in the given year. Dashed line: Kernel density estimator for countries being autocratic in the given year. 1970: 23 democracies and 77 autocracies; 1980: 25 democracies and 87 autocracies; 1990: 44 democracies and 68 autocracies; 2000: 70 democracies and 45 autocracies.

### **1.3.3 Panel Analysis**

The panel analysis aims at estimating the effect of democracy on life expectancy and literacy. As pre-estimation diagnostics indicate that heteroscedasticity and autocorrelation have to be dealt with, we run a cross-sectional time-series feasible generalized least squares regression

with panel specific AR(1), addressing both issues simultaneously.<sup>22</sup> We do the estimation without fixed effects because fixed effects generally capture institutional, political and socioeconomic country characteristics, which are usually quite time-invariant. This is reflected in Table 5.15 in Appendix 1 which shows very little variation of our democracy variables, particularly for Mpol, and also of the dependent variables life expectancy and literacy rate (i.e. little within variation) but more variation over countries (between variation). Utilization of fixed effects would disguise the impact of our democracy variables on life expectancy and the literacy rate. Moreover, one cannot assume that democracy shows effects rapidly. Democracy needs time and stability to perform well (Keefer, 2005), in particularly, with respect to social indicators like life expectancy and literacy that change only incrementally.<sup>23</sup>

In a simple model, we try to explain life expectancy and literacy with our measures of democracy controlling for GDP. GDP is lagged for one period to reduce the apparent problem of endogeneity. Additionally to the measures of democracy and economic development, we include the literacy rate as a proxy of the population's ability to articulate their needs in the political arena, to control politicians' activities and as a proxy of the population's priority for private spending on education and health. We also lag literacy for one period to reduce endogeneity problems. We only include education and its interaction with democracy in the model with life expectancy as our dependent variable. In line with our theoretical reasoning, we incorporate the lagged Gini coefficient to measure the effect of income inequality and ethnic fractionalization as a proxy for social fragmentation.

As pointed out, all variables describe conditions which potentially hamper or foster the functioning of democracy in terms of addressing the needs of the population. Thus, we are interested in their interaction with democracy on the one hand. On the other hand, we want to know whether they have an effect on human development independently from the political system.

Furthermore, we add a set of dummies for global regions<sup>24</sup> as well as year dummies to all regressions. The region dummies should capture much of the geographical, political and historical heterogeneity across the world. The inclusion of period effects allows us to capture overall upward trends in literacy and life expectancy that for example could be explained by technological improvements (Pritchett and Summers, 1996). Moreover, we control in both re-

<sup>22</sup>The Stata command xtgls is used. We assume that variance for each panel differs and that there is serial correlation where the correlation parameter is unique for each panel.

<sup>23</sup>Acemoglu et al. (2008) find a cross-country correlation between income and democracy only in the crosssection and attribute this to a long-term effect, i.e. positive changes in income and democracy over the past 500 years. According to this societies took divergent paths with respect to political and economic changes. This might be reflected here as well.

<sup>24</sup>Following the World Bank definition.

gressions for war, because it destroys lives as well as infrastructure for the provision of health and education services. Additionally, we control for HIV/AIDS in the life expectancy regressions. The AIDS dummy variable is interacted with the year dummies because HIV/AIDS was more of a problem for the more recent years in the sample and less in the earlier ones. A socialism dummy aims to capture heterogeneity across autocracies and an egalitarian tendency in those regimes.

We estimate the model for the years 1970, 1975, 1980, 1985, 1990, 1995 and 2000 (and the preceding five year periods), as both literacy rate and life expectancy are not updated annually but roughly every five years, while being interpolated in the other years. Taking observations of every fifth year is preferred to averaging the five-year data, as averaging introduces additional serial correlation that hinders inference and estimation (Acemoglu et al., 2008).

In case of life expectancy, we run separate regressions for non-OECD countries and the entire sample. For literacy, only the regression for the sub-sample of non-OECD countries makes sense as all OECD countries have an assumed constant level of literacy of exactly 99 percent in the UNDP data. The results are presented in Tables 1.1 to 1.3.

The results for the control variables are as expected in all specifications. The coefficients of the other main explanatory variables carry the expected signs and are highly significant, except for the Gini variable, which has an insignificant sign in most cases. The coefficient of the GDP per capita is positive, the literacy rate has a positive coefficient in the life expectancy regressions (remember that it is not included in the regressions where literacy is the dependent variable), and fractionalization carries a negative sign. All these results are robust to the choice of the democracy measure; they hold both for the fraction of democratic years (Demexp) and the average Polity2 score (Mpol). The coefficients of the year dummies are positive and highly significant for all years. The coefficients are continuously increasing over time and are thus capturing overall progress for human development due to for instance technology. The AIDS\*time dummies are negative and highly significant for 1990, 1995 and 2000. This result displays the tragedy of HIV/AIDS and its immense impact on life expectancy in many African countries during this period. The coefficient of the War dummy is highly negative significant in the regressions with life expectancy as dependent variable and insignificant in the regressions with the literacy rate as dependent variable. The coefficient of the socialism dummy is positive and highly significant whenever included.


Table 1.1: Panel Analysis for All Countries (Dependent Variable: Life Expectancy at Birth)

\* p<0.05, \*\* p<0.01, \*\*\* p<0.001; dummies for global regions included and jointly significant


Table 1.2: Panel Analysis for Non-OECD Countries (Dependent Variable: Life Expectancy at Birth)

\* p<0.05, \*\* p<0.01, \*\*\* p<0.001; dummies for global regions included and jointly significant


Table 1.3: Panel Analysis for Non-OECD Countries (Dependent Variable: Adult Literacy Rate)

\* p<0.05, \*\* p<0.01, \*\*\* p<0.001; dummies for global regions included and jointly significant

There is a strong positive and highly significant correlation between our measures of human development and democracy in nearly all specifications (we will discuss the one exception below). The fraction of democratic years (Demexp) and the institutional maturity of a system measured by the mean of the polity2 score (Mpol) both are positively related to life expectancy at birth.

When it comes to the interaction effects of democracy with GDP per capita, ethnic fractionalization, inequality and literacy respectively the results are rather ambiguous. The interaction of GDP and democracy sometimes carries a positive sign and sometimes a negative sign depending on the measure of democracy and the countries included in the sample. In fact, it is insignificant in most cases. We conclude that there is no robust evidence for this interaction and thus the democracy's performance seems to not depend on the level of economic development. A similar argument holds true for the interaction of inequality and democracy. In the life expectancy regression, its coefficient is only positive and significant when Mpol is used as measure of democracy. In the literacy regression, the Gini interaction effect is only significant for one of the two democracy measures (Demexp) and thus not fully reliable. Contrary to the median voter prediction, it carries a negative sign in this case. The interaction of democracy and literacy is only significant for Mpol and not for Demexp. The interaction of democracy and ethnic fractionalization is significant in the life expectancy regressions for the full sample; it carries the expected negative sign. For the sample of non-OECD countries, it is only significant when Demexp is used as measure of democracy, both for literacy and life expectancy. Hence, there is more support for this interaction effect in the data than for the others, indicating that social fragmentation might disturb democracy's performance in a country.

Overall, there is only weak evidence for any of these interactions. The specifications excluding interaction effects are therefore the more reliable ones. This might also explain why there is no significant effect of democracy on literacy in the model including Mpol and all interaction effects. Summarizing, it can be said that a democracy's association with life expectancy and literacy is positive and robust but does not depend on the circumstances.

## **1.4 Conclusion**

We believe that our study has its associated merits explaining the linkage between democracy and human development. In our theoretical section, we clarified the causal channels of democracy influencing human development. In contrast to earlier studies, which put their focus on property rights, we emphasized the importance of the effects of redistribution and of public goods provision in a democracy. The statistical association between democracy and human development is investigated descriptively and analytically. Extending existing literature, we not only measure the association between democracy and human development, but we theoretically and empirically analyze conditions that are assumed to be important for the functioning of democracy in terms of improving the level of human development.

Empirically, the results show a strong and robust correlation between democracy and human development measured by life expectancy at birth and the literacy rate, even if one controls for the level of economic development and other important variables. Besides, the effect is observed even if autocorrelation of the error terms is taken into account. Since the control of autocorrelation also remedies the omitted variable problem - if the correlation of omitted variables with the right hand side variables is low - we can be sure that the results are indeed robust. The results show that people living in democratic systems do better than people in autocracies and, relying on the theoretical reasoning, a population's well-being is influenced by the political system. Both the stability of a democratic system as well as its institutional maturity are relevant.

However, the observed effect might be traced back more on the cross-sectional variation than the variation over time. This implies that causality is difficult to establish and one can be less certain about the effect of other social and political factors, which are very well proxied by democracy and that do not change over time. Future studies should incorporate social capital as well as the degree of decentralization of the political-administrative system. Conducting a historical examination that begins at the time when democratic systems (in a modern sense) evolve would give more reliable results. In addition, it would certainly be an improvement of our analysis to empirically identify and model the channels that democracy takes before it affects human development, for example via public expenditures. Unfortunately, the data for this endeavor have not been available. Theoretical expectations about the precise conditions interacting with democracy in the creation of a healthy and literate society have not been met. The interaction of democracy and its other presumed conditions of functioning turned out to be insignificant or not robust to different democracy measures or samples. One could therefore conclude that the functioning of democracy - in terms of non-income human development improvements - is rather independent of GDP per capita, inequality, education and also ethnic fractionalization. But the missing robustness of our interaction effects does not permit any inferences.

GDP per capita, education and ethnic fractionalization influence non-income human development levels directly. A high level of economic development and education is related to a high level of non-income human development. High social fragmentation, on the contrary, is associated with lower levels of non-income human development. Income inequality has rather ambiguous results and turns out to be insignificant in most cases, a result that weakens the income inequality hypothesis according to which income inequality worsens well-being in a society.

To sum up: our empirical analysis cannot establish causality. However, based on the theoretical reasoning, the statistical associations suggest, that what is important is democracy itself and only to a smaller extent the circumstances under which it occurs. First, living in a democracy is associated with better health and education, independently from the level of economic development in a country. Secondly, even if the picture here is more ambiguous, the positive association between democratic systems and human development seems to be rather independent of the circumstances. This stands in contrast to what the theoretical literature has told us. However, it can be considered good news for promoting democracy in poor, fragmented or uneducated societies.

Since income inequality did not play a major role in our estimations we found no supporting evidence for the median voter theory. This might be due to different degrees of inequality aversion in a country, although the region dummies in the regression analysis controlled for cultural factors that might capture differences in inequality aversion. Nevertheless, as democracy is positively associated with the well-being of a population, the main question of this paper deserves an affirmative answer. We thus cautiously support Sen's argument that democracy fulfils its "constructive" and "instrumental" role.

## **Chapter 2**

# **The Institutional Basis of Gender Inequality: The Social Institutions and Gender Index (SIGI)**<sup>1</sup>

## **2.1 Introduction**

Despite considerable progress in recent decades, gender inequality in the manifold dimensions of well-being remains pervasive in many developing countries. This is an intrinsic issue of equity as the affected women are deprived of their basic freedoms (Sen, 1999b). But going beyond this intrinsic feature of gender inequality, there is considerable evidence that it implies high costs for society in the form of lower human capital, worse governance, and lower growth (e.g. World Bank, 2001; Klasen, 2002; Klasen and Lamanna, 2009). The intrinsic and instrumental value of gender equality has been recognized and incorporated in the development agenda, for example in Millennium Development Goal 3 "Promote gender equality and empower women" as well as the Convention on the Elimination of Discrimination against Women.

To measure the extent of this problem at the cross-country level several gender-related indices have been proposed, e.g. the Gender-Related Development Index (GDI) and the Gender Empowerment Measure (GEM) (United Nations Development Programme, 1995), the Global Gender Gap Index from the World Economic Forum (Lopez-Claros and Zahidi, 2005), the Gender Equity Index developed by Social Watch (2005) or the African Gender Status Index proposed by the Economic Commission for Africa (2004). These measures focus on gender inequality in well-being or in agency and they are typically outcome-focused (Klasen, 2006, 2007).

<sup>1</sup>joint work with Boris Branisa and Stephan Klasen

Focusing only on outcomes neglects the question of the origins of these inequalities and their great heterogeneity across space and time. Gender inequality is the result of human behavior, and how people behave and interact is influenced by institutions. Thus to understand gender inequality in outcomes, one needs to study the institutional basis of gender inequality.

There are several approaches to institutions. According to North (1990, p. 3 ff.) "institutions are the rules of the game in a society", they are "humanly devised constraints that shape human interaction". From an economics perspective, institutions are conceived as the result of collective choices in a society to achieve gains from cooperation by reducing uncertainty, collective action dilemmas and transaction costs. A sociological or cultural perspective, which is complementary to the rational choice one, relates institutions to culture. Institutions in this sense frame meanings and beliefs. People try to satisfy norms rather than to act individually within the rules of the game, i.e. institutions do not canalize preferences of actors, they influence the preferences and shape the role models and identities of the actors themselves. Legitimacy and appropriateness as well as cultural authority, power in a society and community dynamics might be more relevant in shaping such institutions that become taken for granted without continuously being evaluated against efficiency considerations (Hall and Taylor, 1996, and references therein).

There is a particular type of institutions that is relevant for gender inequality, *social institutions related to gender inequality*. These institutions are more embedded in the culturalsociological account although efficiency issues may also be important. We conceive these social institutions as long-lasting norms, values and codes of conduct that find expression in traditions, customs and cultural practices, informal and formal laws. They underlie gender roles and the distribution of power between men and women in the family, in the market and in social and political life. Consequently, they shape the social and economic opportunities of men and women, their autonomy in taking decisions (Dyson and Moore, 1983; Abadian, 1996; Hindin, 2000; Bloom et al., 2001) or their capabilities to live the life they value (Sen, 1999b). That is why they might affect important development outcomes and contribute to outcome gender inequalities (De Soysa and Jütting, 2007).

Three measures proxy in one way or another social institutions, which determine how women are treated in society: the Women's Political Rights index (WOPOL), the Women's Economic Rights index (WECON), and the Women's Social Rights index (WOSOC) of the CIRI Human Rights Data Project.<sup>2</sup> These indices take a human rights perspective and measure on a yearly basis whether a number of internationally recognized rights for women are included in law and whether government enforces them. From the three indices, WOSOC is

<sup>2</sup>Information is available on the webpage of the project http://ciri.binghamton.edu/.

the most encompassing measure covering social relations (Bjørnskov et al., 2009). However, it does not allow one to differentiate between different dimensions of social institutions. For example, it is important to distinguish between what happens within the family and what happens in public and social life. Furthermore, other shortcomings of all three indices are that they also cover outcomes of institutions, and they can only take four values from 0 (no rights) to 3 (legally guaranteed and enforced rights) which makes it difficult to compare and rank countries as there are many ties, i.e. equal scores, in the data.

In this paper we propose new composite measures that proxy social institutions related to gender inequality in non-OECD countries which are based on variables of the OECD Gender, Institutions and Development (GID) database (Morrisson and Jütting, 2005; Jütting et al., 2008). These are the Social Institutions and Gender Index (SIGI) as a multidimensional measure of the deprivation of women and its five one-dimensional subindices Family code, Civil liberties, Physical integrity, Son preference and Ownership rights.

In general, the construction of composite measures requires several decisions, for example about the weighting scheme and the method of aggregation (e.g. Nardo et al., 2005). The subindices as one-dimensional measures are built using the method of polychoric principal component analysis to extract the common information of the variables corresponding to a subindex (Kolenikov and Angeles, 2009). When we combine the subindices to construct the SIGI, we use a reasonable methodology to capture the multidimensional deprivation of women caused by social institutions. The formula of the SIGI is inspired by the Foster-Greer-Thorbecke poverty measures (Foster et al., 1984) and offers a new way of aggregating gender inequality in several dimensions measured by the subindices. It is transparent and easy to understand, it penalizes high inequality in each dimension and allows only for partial compensation between dimensions.

The SIGI and the subindices are useful tools to compare the societal situation of women in over 100 non-OECD countries from a new perspective, allowing the identification of problematic countries and dimensions of social institutions that deserve attention by policy makers and need to be scrutinized in detail. Empirical results show that the SIGI provides additional information to that of other well-known gender-related indices. Moreover, regression analysis shows that the SIGI is related to indices that measure outcome gender inequality, even if one controls for region, religion and the level of economic development.

This paper is organized as follows. In section 2.2, we describe the OECD GID Database. Then, in sections 2.3 and 2.4 we focus on the construction of the subindices and of the SIGI. In section 3.5.2, we present empirical results by country, interesting regional patterns and a comparison between the SIGI and other gender-related measures. Furthermore, using regression analysis we illustrate the relevance of the SIGI for explaining outcome gender inequality. The last section concludes with a discussion of the strengths and weaknesses of the proposed measures.

## **2.2 The OECD Gender, Institutions and Development (GID) Database**

As input for the composite measures we use variables from the OECD GID database (Morrisson and Jütting, 2005; Jütting et al., 2008). This is a cross-country database covering about 120 countries with more than 20 variables measuring social institutions related to gender inequality.<sup>3</sup> These variables proxy social institutions through prevalence rates, legal indicators or indicators of social practices. We assume that the concept social institutions related to gender inequality is multidimensional. Following previous work done by the OECD (Jütting et al., 2008) we choose twelve variables that are assumed to measure each one of four dimensions of social institutions.

The *Family code* dimension refers to the private sphere with institutions that influence the decision-making power of women in the household. Family code is measured by the following four variables. *Parental authority* measures whether women have the right to be the legal guardian of a child during marriage, and whether women have custody rights over a child after divorce. *Inheritance* is based on formal inheritance rights of spouses. *Early marriage* measures the percentage of girls between 15 and 19 years of age who are/were ever married. *Polygamy* measures the acceptance of polygamy in the population. Countries where this information is not available are assigned scores based on the legality of polygamy.4

The public sphere is measured by the *Civil liberties* dimension that captures the freedom of social participation of women and includes the following two variables. *Freedom of movement* indicates the freedom of women to move outside the home. *Freedom of dress* is based on the obligation of women to use a veil or burqa to cover parts of their body in public.

The *Physical integrity* dimension comprises different indicators on violence against women. The variable *violence against women* indicates the existence of laws against domestic violence, sexual assault or rape, and sexual harassment. *Female genital mutilation* is the percentage of women who have undergone female genital mutilation. *Missing women* measures gender bias in mortality. Countries were coded based on estimates of gender bias in mortality

<sup>3</sup>The data are available at the web-pages http://www.wikigender.org and http://www.oecd.org/dev/gender/gid.

<sup>4</sup>Acceptance of polygamy in the population might proxy actual practices better than the formal indicator legality of polygamy and, moreover, laws might be changed faster than practices. Therefore, the acceptance variable is the first choice for the subindex Family code. The reason for using legality when acceptance is missing is to increase the number of countries.

for a sample of countries (Klasen and Wink, 2003) and on sex ratios of young people and adults.

The *Ownership rights* dimension covers the economic sphere of social institutions proxied by the access of women to several types of property. *Women's access to land* indicates whether women are allowed to own land. *Women's access to bank loans* measures whether women are allowed to access credits. *Women's access to property other than land* covers mainly access to real property such as houses, but also any other property.

Concerning the *missing women* variable in the *Physical integrity* dimension, it could be argued that it reflects another dimension of gender inequality. Missing women is an extreme manifestation of son preference under scarce resources. 100 million women are not alive who should be alive if women were not discriminated against (Sen, 1992; Klasen and Wink, 2003). The other components of *Physical integrity*, *violence against women* and *female genital mutilation*, measure particularly the treatment of women which is not only motivated by economic considerations. In the next section, we check with statistical methods if *missing women* measures another dimension as the variables *violence against women* and *female genital mutilation*.

These twelve variables are between 0 and 1. The value 0 means no or very low inequality and the value 1 indicates high inequality. Three of the variables (*early marriage, female genital mutilation and violence against women*) are continuous. The other indicators measure social institutions on an ordinal categorical scale. The chosen variables cover around 120 non-OECD countries from all regions in the world except North America.5 The choice of the variables is also guided by the availability of information so that as many countries as possible can be ranked by the SIGI. Within our sample 102 countries have information for all twelve variables.

## **2.3 Construction of the Subindices**

The objective of the subindices is to provide a summary measure for each dimension of social institutions related to gender inequality. In every subindex we want to combine variables that are assumed to belong to one dimension. The first step is to check the statistical association between the variables. The second step consists in aggregating the variables with a reasonable weighting scheme.

<sup>5</sup>The OECD Gender, Institutions and Development Database does not contain variables that capture relevant social institutions related to gender inequality in OECD countries.

### **2.3.1 Measuring the Association Between Categorical Variables**

To check the association between variables, and as most of them are ordinal, we use Kendall Tau b and Multiple Joint Correspondence Analysis (Greenacre, 2007; Nenadic, 2007). Kendall ´ Tau b is a rank correlation coefficient. These measures are useful when the data are ordinal and thus the conditions for using Pearson's correlation coefficient are not fulfilled. For each variable, the values are ordered and ranked. Then the correspondence between the rankings is measured.6

Taking into account tied pairs, the formula for Kendall Tau b is

$$\mathfrak{r}\_b = \frac{C - D}{\sqrt{\frac{n(n-1)}{2 - T\_\pi} \frac{n(n-1)}{2 - T\_\gamma}}},\tag{2.1}$$

where *C* is the number of concordant pairs, *D* is the number of discordant pairs, *n* is the number of observations, *<sup>n</sup>*(*n*−1) <sup>2</sup> is the number of all pairs, *Tx* is the number of pairs tied on the variable *x* and *Ty* is the number of pairs tied on the variable *y*. The notation is taken from Agresti (1984).

As a second method to check the association between variables we examine the graphics produced by Multiple Joint Correspondence Analysis (MJCA) (Greenacre, 2007; Nenadic,´ 2007), after having discretized the three continuous variables. Correspondence Analysis is a method for analyzing and representing the structure of contingency tables graphically. We use MJCA to find out whether variables seem to measure the same.7

The results for Kendall Tau b (Tables 5.17- 5.21) are reported in Appendix 2. A significant positive value of Kendall Tau b is a sign for a positive association between two variables. This is the case for all variables belonging to one dimension, except *missing women* in the

<sup>6</sup>For calculating Kendall Tau, one counts the number of concordant and discordant pairs of two rankings, builds the difference and divides this difference by the total number of pairs. A value of 1 means total correspondence of rankings, i.e. the rankings are the same. A value of -1 indicates reverse rankings or a negative association between rankings. A value of 0 means independence of rankings. Kendall Tau b is a variant of Kendall tau that corrects for ties, which are frequent in the case of discrete data (Agresti, 1984, chap. 9). We consider Kendall Tau b to be the appropriate measure of rank correlation to find out whether our data are related.

<sup>7</sup>Correspondence Analysis is an exploratory and descriptive method to analyze contingency tables. Instead of calculating a correlation coefficient to capture the association of variables, the correspondence of conditional and marginal distributions of either rows or columns - also called row or column profiles - is measured using a χ2-statistic, that captures the distance between them. These row or column profiles then are plotted in a low-dimensional space, so that the distances between the points reflect the dissimilarities between the profiles. Multiple Joint Correspondence Analysis is an extended procedure for the analysis of more than two variables and considers the cross-tabulations of the variables against each other in a so-called Burt matrix but with modified diagonal sub-tables. This facilitates to figure out whether variables are associated. This is the case when they have similar deviations from homogeneity, and therefore get a similar position in a profile space (Greenacre, 2007; Nenadic, 2007). ´

subindex *Physical integrity*. The graphs produced with MJCA (Figures 5.1- 5.5) are also in Appendix 2.<sup>8</sup> The results of MJCA confirm that within every dimension all the variables seem to measure the same dimension, with the exception of *missing women* in the dimension *Physical integrity*. These results support the argumentation in section 2.2.

We decide to use the variable *missing women* as a fifth subindex called *Son preference*. The artificially higher female mortality is one of the most important and cruel aspects of gender inequality and should not be neglected, as over 100 million women that should be alive are missing (Sen, 1992; Klasen and Wink, 2003). Missing women is the "starkest manifestation of the lack of gender equality" (Duflo, 2005).

### **2.3.2 Aggregating Variables to Build a Subindex**

The five subindices *Family code*, *Civil liberties*, *Son preference*, *Physical integrity* and *Ownership rights* use the twelve variables as input that were mentioned in the previous section to measure each one dimension of social institutions related to gender inequality. In the case of Son preference, the subindex takes the value of the variable missing women. In all other cases, the computation of the subindex values involves two steps.

In a first step, the method of polychoric principal component analysis is used to extract the common information of the variables corresponding to a subindex. Principal component analysis (PCA) is a method of dimensionality reduction that is valid for normally distributed variables (Jolliffe, 1986). This assumption is violated in this case, as the data include variables that are ordinal, and hence the Pearson correlation coefficient is not appropriate. Following Kolenikov and Angeles (2004, 2009) we use polychoric PCA, which relies on polychoric and polyserial correlations. These correlations are estimated with maximum likelihood, assuming that there are latent normally distributed variables that underly the ordinal categorical data. We use the First Principal Component (FPC) as a proxy for the common information contained by the variables corresponding to the subindices. The first principal component is the weighted sum of the standardized original variables that captures as much of the variance in the data as possible.<sup>9</sup> The standardization of the original variables is done as follows. In the case of continuous variables, one subtracts the mean and then divides by the standard deviation. In the case of ordinal categorical variables, the standardization uses results of an ordered probit model. The weight that each variable gets in these linear combi-

<sup>8</sup>The graphs produced with MJCA can be interpreted in the following way. In most cases, one of the axes represents whether there is inequality and the other axe represents the extent of inequality. If one connects the values of a variable one obtains a graphical pattern. If this is similar to the pattern obtained for another variable, then both variables are associated.

<sup>9</sup>The proportion of explained variance by the first principal component is 70% for *Family code*, 93% for *Civil liberties*, 60% for *Physical integrity* and 87% for *Ownership rights*.

nations is obtained by analyzing the correlation structure in the data. The weights are shown in Table 2.1.


Table 2.1: Weights from Polychoric PCA

In a second step, the subindex value is obtained rescaling the FPC so that it ranges from 0 to 1 to ease interpretation. A country with the best possible performance (no inequality) is assigned the value 0 and a country with the worst possible performance (highest inequality) the value 1. Hence, the subindex values of all countries are between 0 and 1. Using the score of the FPC the subindex is calculated using the following transformation. Country *X* corresponds to a country of interest, Country *Worst* corresponds to a country with worst possible performance and Country *Best* is a country with best possible performance.

$$\begin{array}{rcl} \text{Subindex(Country X)} &=& \frac{\text{FPC(Country X)}}{\text{FPC(Country Wort)} - \text{FPC(Country Best)}}\\ &=& \frac{\text{FPC(Country Best)}}{\text{FPC(Country Wort)} - \text{FPC(Country Best)}} \end{array} \tag{2.2}$$

To check whether the subindices are empirically non-redundant, so that each of them provides additional information, we conduct an empirical analysis of the statistical association between them. In the case of well-being measures, McGillivray and White (1993) suggest using two explicit thresholds to separate redundancy from non-redundancy, that is a correlation coefficient of 0.90 and 0.70. Based on this suggestion we use the threshold 0.80. In Table 2.2 we present Kendall tau b as a measure of the statistical association between the five subindices. In all cases, the subindices are positively correlated, showing that they all measure social institutions related to gender inequality. It must be noted, however, that the correlation is not always statistically significant. Kendall tau b is lower than 0.80 in all cases, which means that each subindex measures a distinct aspect of social institutions related to gender inequality.


Table 2.2: Kendall Tau b Between Subindices

## **2.4 The Social Institutions and Gender Index (SIGI)**

With the subindices described in the last section as input, we build a multidimensional composite index named Social Institutions and Gender Index (SIGI) which reflects the deprivation of women caused by social institutions related to gender inequality. The proposed index is transparent and easy to understand. As in the case of the variables and of the subindices, the index value 0 corresponds to no inequality and the value 1 to complete inequality.

The SIGI is an unweighted average of a non-linear function of the subindices. We use equal weights for the subindices, as we see no reason for valuing one of the dimensions more or less than the others.<sup>10</sup> The non-linear function arises because we assume that inequality

<sup>10</sup>Empirically, even in the case of equal weights the ranking produced by a composite index is influenced by the different variances of its components. The component that has the highest variance has the largest influence

in gender-related social institutions leads to deprivation experienced by the affected women, and that deprivation increases more than proportionally when inequality increases. Thus, high inequality is penalized in every dimension. The non-linearity also means that the SIGI does not allow for total compensation among subindices, but permits partial compensation. Partial compensation implies that high inequality in one dimension, i.e. subindex, can only be partially compensated with low inequality on another dimension.11

For our specific five subindices, the value of the index the SIGI is then calculated as follows.

$$\begin{array}{rcl} \text{SIGI} &=& \frac{1}{\mathfrak{F}} \text{ (Subindex\text{ Example } \text{Code)}^2 + \frac{1}{\mathfrak{F}} \text{ (Subindex\text{Civil}\text{Liberities})^2} \\ &+& \frac{1}{\mathfrak{F}} \text{ (Subindex\text{Physical Integers})^2 + \frac{1}{\mathfrak{F}} \text{ (Subindex\text{Non-preference})^2} \\ &+& \frac{1}{\mathfrak{F}} \text{ (Subindex\text{Down}nership Richts)^2} \end{array}$$

Using a more general notation, the formula for the SIGI *I*(*X*), where *X* is the vector containing the values of the subindices *xi* with *i* = 1,...,*n*, is derived from the following considerations. For any subindex *xi*, we interpret the value 0 as the goal of no inequality to be achieved in every dimension. We define a deprivation function φ(*xi*,0), with φ(*xi*,0) > 0 if *xi* > 0 and φ(*xi*,0) = 0 if *xi* = 0 (e.g. Subramanian, 2007). Higher values of *xi* should lead to a penalization in *I*(*X*) that should increase with the distance *xi* to zero. In our case the deprivation function is the square of the distance to 0 so that deprivation increases more than proportionally as inequality increases.

$$SIGI = I(X) = \frac{1}{n} \sum\_{i=1}^{n} \phi(x\_i, 0) = \frac{1}{n} \sum\_{i=1}^{n} (x\_i - 0)^2 = \frac{1}{n} \sum\_{i=1}^{n} (x\_i)^2.$$

The formula is inspired by the Foster-Greer-Thorbecke (FGT) poverty measures (Foster et al., 1984). The general FGT formula is defined for *yi* ≤ *z* as:

$$FGT(Y,\alpha,z) = \frac{1}{n} \sum\_{i=1}^{n} \left(\frac{z - y\_i}{z}\right)^{\alpha},$$

where *Y* is the vector containing all incomes, *yi* with *i* = 1,...,*n* is the income of individual *i*, *z* is the poverty line, and α > 0 is a penalization parameter.

on the composite index. In the case of the SIGI the variances of the five components are reasonably close to each other, *Ownership rights* having the largest and *Physical integrity* having the lowest variance.

<sup>11</sup>Other approaches have also been proposed in the literature, e.g. the non-compensatory approach by Munda and Nardo (2005a,b).

To compute the SIGI, the value 2 is chosen for α as the square function has the advantage of easy interpretation. With α = 2 the *transfer principle* is satisfied (Foster et al., 1984). In the context of poverty this principle means that a transfer from a person below the poverty line to a person less poor will raise poverty if the set of poor remains unchanged. In the case of the SIGI, the transfer principle means that an increase in inequality in one dimension and a decrease of inequality in another dimension of the same magnitude will raise the SIGI.

Some differences between the SIGI and the FGT measures must be highlighted. In the case of the SIGI, we are aggregating across dimensions and not over individuals. Moreover, in contrast to the income case, a lower value of *xi* is preferred, and the normalization achieved when dividing by the poverty line *z* is not necessary as 0 ≤ *xi* ≤ 1, *i* = 1,...,*n*.

The SIGI fulfills several properties. For a formal presentation of the properties and the proofs, see Appendix 2.


To highlight the effects of partial compensation as compared to total compensation we computed the statistical association between the SIGI and a simple arithmetic average of the five subindices that allows for total compensation and compared the country rankings of both measures in Appendix 2.<sup>12</sup> The Pearson correlation coefficient between the SIGI and the simple arithmetic average of the five subindices is 0.96 and statistically significant showing a high correlation between both measures. However, when we compare the ranks of the SIGI with those obtained using a simple arithmetic average of the five subindices in Table 5.22 in Appendix 2, we observe that there are noticeable differences in the rankings of the 102 included countries. Examples are China and Nepal. China ranks in position 55 using the simple average, but worsens to place 83 in the SIGI ranking. Nepal has place 84 when the simple average is used, and improves to rank 65 in the SIGI ranking. For China, this is due to the high value on the subindex *Son preference*, which in the SIGI case cannot be fully compensated with relatively low values for the other subindices. For Nepal we observe the opposite case as all subindices have values reflecting moderate inequality.

## **2.5 Results**

### **2.5.1 Country Rankings and Regional Patterns**

In Table 5.23 in Appendix 2, the results for the SIGI and its five subindices are presented. Among the 102 countries considered by the SIGI Paraguay, Croatia, Kazakhstan, Argentina and Costa Rica have the lowest levels of gender inequality related to social institutions. Sudan is the country that occupies the last position, followed by Afghanistan, Sierra Leone, Mali and Yemen, which means that gender inequality in social institutions is a major problem there.<sup>13</sup>

Rankings according to the subindices are as follows. For *Family code* 112 countries can be ranked. Best performers are China, Jamaica, Croatia, Belarus and Kazakhstan. Worst performers are Mali, Chad, Afghanistan, Mozambique and Zambia. In the dimension *Civil liberties* 123 countries are ranked. Among them 83 share place 1 in the ranking. Sudan, Saudi Arabia, Afghanistan, Yemen and Iran occupy the last five positions of high inequality. 114 countries can be compared with the subindex *Physical Integrity*. Hong Kong, Bangladesh, Chinese Taipei, Ecuador, El Salvador, Paraguay and Philippines are at the top of the ranking while Mali, Somalia, Sudan, Egypt and Sierra Leone are at the bottom. In the dimension *Son*

<sup>12</sup>We cannot compare the SIGI with the results of the non-compensatory index as proposed by Munda and Nardo (2005a,b). The algorithm used for calculating non-compensatoryindices compares pairwise each country for each subindex. However, as our dataset includes many countries with equal values on several subindices, the numerical algorithm cannot provide a ranking.

<sup>13</sup>The subindices are computed for countries that have no missing values on the relevant input variables. In the case of the SIGI only countries that have values for every subindex are considered.

*preference* 88 out of 123 countries rank at the top as they do not have problems with missing women. The countries that rank worst are China, Afghanistan, Papua New Guinea, Pakistan, India and Bhutan. Finally, 122 countries are ranked with the subindex *Ownership rights*. 42 countries share position 1 as they have no inequality in this dimension. On the other hand, the four worst performing countries are Sudan, Sierra Leone, Chad and the Democratic Republic of Congo.


Table 2.3: Regional Pattern of the Composite Index and Subindices

ECA stands for Europe and Central Asia, LAC for Latin America and the Caribbean, EAP for East Asia and Pacific, SSA for Sub-Saharan Africa, and MENA for Middle East and North Africa.

To find out whether apparent regional patterns in social institutions related to gender inequality are systematic, we divide the countries in quintiles following the scores of the SIGI and its subindices (Table 2.3). The first quintile includes countries with lowest inequality, and the fifth quintile countries with highest inequality.

For the SIGI, no country of Europe and Central Asia (ECA) or Latin America and the Caribbean (LAC) is found in the two quintiles reflecting social institutions related to high gender inequality. In contrast, most countries in South Asia (SA), Sub-Saharan Africa (SSA), and Middle East and North Africa (MENA) rank in these two quintiles. It is interesting to note that in the most problematic regions two countries rank in the first two quintiles. These are Mauritius (SSA) and Tunisia (MENA). East Asia and Pacific (EAP) has countries in all five quintiles with Philippines, Thailand, Hong Kong and Singapore in the first quintile and China in the fifth quintile.

Going on with the subindices the patterns are similar to the one of the SIGI. As more information is available for the subindices, the number of countries covered by every subindex is different and higher than for the SIGI. In the following some interesting facts are highlighted, especially those countries whose scores are different than the average in the region.


• *Ownership rights*: Most problematic regions are SA, SSA and MENA. Nevertheless, there are cases in these regions that rank in the first quintile. These are Egypt, Israel, Kuwait and Tunisia (MENA), Bhutan (SA), and Eritrea and Mauritius (SSA).

### **2.5.2 Simple Correlation with other Gender-related Indices**

The SIGI is an important measure to understand gender inequality as it measures institutions that influence the basic functioning of society and explain gender inequality in outcomes. From this perspective, the SIGI has an added value to other gender-related measures irrespective from an empirical redundancy perspective, i.e. whether it provides additional information as compared to other measures.

Nevertheless, one can check whether the index is empirically redundant by computing the statistical association between the SIGI and other well-known gender-related indices. Relying on McGillivray and White (1993) we use a correlation coefficient of 0.80 in absolute value as the threshold to separate redundancy from non-redundancy.

We calculate Pearson correlation coefficient and Kendall Tau b as a measure of rank correlation between the SIGI and each of the following indices: the Gender-related Development Index (GDI) and the Gender Empowerment Measure (GEM) from United Nations Development Programme (2006), the Global Gender Gap Index (GGG) from Hausmann et al. (2007) and the Women's Social Rights Index.<sup>14</sup> As the GDI and the GEM have been criticized in the literature (e.g. Klasen, 2006; Schüler, 2006), we also do the analysis for two alternative measures, the Gender Gap Index Capped (GGI) and a revised Gender Empowerment Measure (GEM2) based on income shares proposed by Klasen and Schüler (2009).<sup>15</sup> For all the indices considered both measures of statistical association are lower than 0.80 in absolute value and statistically significant (Table 2.4). We conclude that the SIGI is related to these gender measures but is non-redundant. The comparison of the country rankings of the SIGI and these other measures can be found in Table 5.24 in Appendix 2.

<sup>14</sup>Data obtained from http://ciri.binghamton.edu/.

<sup>15</sup>The Gender Gap Index Capped (GGI) is a geometric mean of the ratios of female to male achievements in the dimensions health, education and labor force participation. "Capped" means that every component is capped at one before calculating the geometric mean. This is necessary as a better relative performance of women, e.g. in the dimension health can be due to a risky behavior of men that should not be rewarded. GGI can be more directly interpreted as a measure of gender inequality while the GDI measures human development penalizing gender inequality. The GEM has three components, political representation, representation in senior positions in the economy, and power over economic resources. The most problematic component is power over economic resources proxied by earned incomes. This component measures female and male earned incomes using income levels adjusted for gender gaps but not the gender gaps themselves. The revised version GEM2 uses income shares of males and females.


Table 2.4: Statistical Association Between the SIGI and Other Gender-related Measures

Data for the Gender-related development Index (GDI) and the Gender Empowerment Measure (GEM) are from United Nations Development Programme (2006) and are based on the year 2004. The Gender Gap Index (GGI) capped and the revised Gender Empowerment Measure (GEM revised) are taken from Klasen and Schüler (2009) based on the year 2004. Data for the Global Gender Gap Index (GGG) are from Hausmann et al. (2007). The Women's Social Rights Index (WOSOC) data correspond to the year 2007 and are obtained from http://ciri.binghamton.edu/. The p-values correspond to the null hypothesis that the SIGI and the corresponding measure are independent.

### **2.5.3 Regression Analysis**

The SIGI is aimed to measure the institutional basis of gender inequality. To explore whether the SIGI is associated with gender inequality in outcomes controlling for other factors we run linear regressions with two well-known indices of gender inequality as dependent variables and the SIGI as a regressor. We choose the Global Gender Gap Index (GGG) as the first response variable because it is an encompassing measure reflecting gaps in outcome variables related to basic rights such as health, economic participation and political empowerment. The second response variable is the ratio of GDI to HDI being a composite measure of gender inequality in the dimensions health, education and income. As the GDI is not really a measure of gender inequality, but measures human development penalizing gender inequality, UNDP recommends using the ratio of GDI to HDI.16

In both regressions we control for the level of economic development using the log of per capita GDP in constant prices (US\$, PPP, base year: 2005) (World Bank, 2008); for religion using a Muslim majority and a Christian majority dummy, the left-out category being countries that have neither a majority of Muslim nor a majority of Christian population (Central Intelligence Agency, 2009); and for geography and other unexplained heterogeneity that

<sup>16</sup>http://hdr.undp.org/en/statistics/indices/gdi\_gem/, date of access: April 16, 2010

might go together with region using region dummies, the left-out category being Sub-Saharan Africa. As the number of observations is lower than 100, we use HC3 robust standard errors proposed by Davidson and MacKinnon (1993) to account for possible heteroscedasticity in our data.


Table 2.5: Linear Regression with Dependent Variables GGG and Ratio GDI to HDI

Note: \*\*\* p<0.01, \*\* p<0.05, \* p<0.1 HC3 robust standard errors in brackets.

The regression results are presented in Table 2.5. The regression with GGG as dependent variable includes 72 countries and the coefficient of determination *ad justedR*<sup>2</sup> is 0.62. The SIGI is negatively associated with GGG and significant at the 1% level. The second regression with the ratio of GDI to HDI as dependent variable includes 78 countries and the corresponding *ad justedR*<sup>2</sup> is 0.43. The SIGI is again negatively associated with the response variable and this association is statistically significant at the 1% level. The results suggest that gender inequality in well-being and empowerment is strongly associated with social institutions that shape gender roles.

Even if we include control variables in the regressions we cannot rule out omitted variable bias, but as we consider that social institutions related to gender inequality are relatively stable and long-lasting, we consider that endogeneity does not pose a major problem. To

check that our findings are not driven by observations that have large residuals and/or high leverage, we also run robust regressions which yield similar results.<sup>17</sup>

## **2.6 Conclusion**

In this paper we present composite indices that offer a new approach to gender inequality, which has been neglected in the literature and by other gender measures, which focus mainly on well-being and agency. Instead of measuring gender inequality in education, health, economic or political participation and other dimensions, the proposed measures proxy the underlying social institutions that are mirrored by societal practices and legal norms that might produce inequalities between women and men in developing countries.

Based on 12 variables of the OECD Gender, Institutions and Development (GID) Database (Morrisson and Jütting, 2005; Jütting et al., 2008) we construct five subindices each capturing one dimension of social institutions related to gender inequality: *Family code*, *Civil liberties*, *Physical integrity*, *Son preference* and *Ownership rights*. The Social Institutions and Gender Index (SIGI) combines the subindices into a multidimensional index of deprivation of women caused by social institutions related to gender inequality. With these measures over 100 developing countries can be compared and ranked.

When constructing composite indices one is always confronted with decisions and tradeoffs concerning for example the choice and treatment of the variables included, the weighting scheme and the aggregation method. We try to be transparent in our choices. As the subindices are intended each to proxy one dimension of social institutions, we use the method of polychoric PCA to extract the common element of the included variables (Kolenikov and Angeles, 2009). The methodology for constructing the multidimensional SIGI is based on the assumption that in each dimension deprivation of women increases more than proportionally when inequality increases, and that each dimension should be weighted equally. The formula of the SIGI is inspired by the FGT poverty measures (Foster et al., 1984) and has the advantage of penalizing high inequality in each dimension and only allowing for partial compensation among the five dimensions. We consider that the formula to compute the SIGI is easy to understand and to communicate.

However, some limitations of the subindices and the SIGI must be noted. First, a composite index depends on the quality of the data used as input. Social institutions related to gender inequality are hard to measure and the work accomplished by the OECD in building the GID database is an important step forward. It is worthwhile to continue this endeavor and

<sup>17</sup>Results are available upon request. The type of robust regression we perform uses iteratively reweighted least squares and is described in Hamilton (1992). A regression is run with ordinary least squares, then case weights based on absolute residuals are calculated, and a new regression is performed using these weights. The iterations continue as long as the maximum change in weights remains above a specified value.

invest more resources in the measurement of social institutions related to gender inequality. This includes data coverage, coding schemes and the refinement of indicators. It would be useful to exploit data available, for example from Demographic and Health Surveys (DHS)18, that specifically address the perception that women have of violence against women, and to finance surveys in countries where data is not available.

Secondly, by aggregating variables and subindices, one evitable loses some information. Figures and rankings according to the SIGI and the subindices should not substitute a careful investigation of the variables from the database. Furthermore, to understand the situation in a given country additional qualitative information could be valuable.

Thirdly, one should keep in mind that OECD countries are not included in our sample, as social institutions related to gender inequality in these countries are not well captured by the 12 variables used for building the composite measures. This does not mean that this phenomenon is not relevant for OECD countries, but that further research is required to develop appropriate measures.

Nonetheless, the SIGI and its subindices offer a new perspective to understand gender inequality. Empirical results show that the SIGI is statistically non-redundant and adds new information to other well-known gender-related measures. The SIGI and the five subindices can help policy-makers to detect in which developing countries and in which dimensions of social institutions problems need to be addressed. For example, according to the SIGI scores, regions with highest inequality are South Asia, Sub-Saharan Africa, and the Middle East and North Africa. The composite measures can be valuable instruments to generate public discussion. Moreover, the SIGI and its subindices have the potential to influence current development thinking as they highlight social institutions that affect overall development. As is shown in the literature (e.g. Klasen, 2002; Klasen and Lamanna, 2009), gender inequality in education negatively affects overall development. Economic research investigating these outcome inequalities should consider social institutions related to gender inequality as possible explanatory factors. Results from regression analysis show that the SIGI is related to gender inequality in well-being and empowerment, even after controlling for region, religion and the level of economic development.

<sup>18</sup>Information is available on the webpage http://www.measuredhs.com/.

## **Chapter 3**

# **Why We Should All Care About Social Institutions Related to Gender Inequality**<sup>1</sup>

## **3.1 Introduction**

Institutions are a major factor explaining development outcomes. They guide human behavior and shape human interaction (North, 1990). Institutions are humanly devised to reduce uncertainty and transaction cost, they are rooted in culture and history and sometimes they are taken for granted and become beliefs (Hall and Taylor, 1996; De Soysa and Jütting, 2007). This study centers on a special type of institutions and their explanatory value for development outcomes: social institutions related to gender inequality.

It is an established fact that gender inequalities come at a cost. Besides the consequences that the affected women experience because they are deprived of their basic freedoms (Sen, 1999b), gender inequalities affect the whole society. They can lead to ill-health, low human capital, bad governance and lower economic growth (e.g. World Bank, 2001; Klasen, 2002). Gender inequalities can be observed in outcomes like education, health and economic and political participation, but they are rooted in gender roles that evolve from institutions that shape everyday life and form role models that people try to fulfill and satisfy. We refer to these long-lasting norms, values and codes of conduct as social institutions related to gender inequality.

We investigate the impact of these social institutions related to gender inequality on development outcomes, controlling for relevant determinants such as religion, political system, geography and the level of economic development. As development outcomes we choose indicators from the fields of education, demographics, health and governance. In particular,

<sup>1</sup>joint work with Boris Branisa and Stephan Klasen

we use female secondary schooling, fertility rates, child mortality and governance in the form of rule of law and voice and accountability. We choose these indicators as they are related to economic development and allow us to find out whether social institutions related to gender inequality hinder progress in reaching the Millennium Development Goals.2

Most of the studies that have a similar research focus are conducted at the household level and proxy social institutions related to gender with measures of the autonomy or status of women (e.g. Abadian, 1996; Hindin, 2000). At the cross-country level data are scarce and therefore only a few studies are available that center on the development impact of genderrelevant social institutions (e.g. Morrisson and Jütting, 2005; Jütting et al., 2008).

Using the *Social Institutions and Gender Index* (SIGI) and its five subindices *Family code*, *Civil liberties*, *Physical integrity*, *Son preference* and *Ownership rights* proposed in Essay 2, we investigate whether social institutions related to gender inequality are associated with the chosen development outcomes at the cross-country level.<sup>3</sup> These indices cover between 102 and 123 developing countries and are built out of twelve variables of the OECD Gender, Institutions and Development Database that proxy social institutions through prevalence rates, indicators of social practices and legal indicators.(Morrisson and Jütting, 2005; Jütting et al., 2008).<sup>4</sup> The five subindices of the SIGI each measure one dimension of social institutions related to gender inequality.5 The *Family code* subindex captures institutions that directly influence the decision-making power of women in the household. It is composed of four variables that measure whether women have the right to be the legal guardian of a child during marriage and whether women have custody rights over a child after divorce, whether there are formal inheritance rights for wives, the percentage of girls between 15 and 19 years of age who are/have been married, and the acceptance of polygamy in the population.<sup>6</sup> The *Civil liberties* subindex covers the freedom of social participation of women and combines two variables, freedom of movement of women and freedom of dress, i.e. whether there is

<sup>2</sup>In particular, goal 3 "Promote gender equality and empower women", goal 4 "Reduce child mortality" and goal 5 "Improve maternal health" are relevant here, although the other goals can be at least indirectly linked to our chosen indicators.

<sup>3</sup>As discussed in Essay 2, an alternative measure of social institutions would be the Women's Social Rights index (WOSOC) of the CIRI Human Rights Data Project (http://ciri.binghamton.edu/), which measures from a human rights perspective the type of institutions we are interested in. We prefer to work with the SIGI and its subindices and not with WOSOC as the latter also covers outcomes of these institutions and does not allow one to differentiate between dimensions of social institutions, e.g. between what happens within the family and what happens in public life. Moreover, WOSOC can only take four values, from 0 to 3, which makes it difficult to compare countries as there are many ties, meaning equal scores, in the data.

<sup>4</sup>The data are available at the web-pages http://www.wikigender.org and

http://www.oecd.org/dev/gender/gid.

<sup>5</sup>To extract the common information of the variables used to construct one subindex the method of polychoric principal component analysis is used (Kolenikov and Angeles, 2009).

<sup>6</sup>Countries where this information is not available are assigned scores based on the legality of polygamy.

an obligation for women to use a veil or burqa to cover parts of their body in public. The *Physical integrity* dimension comprises two indicators of violence against women, the existence of laws against domestic and sexual violence and the percentage of women who have undergone female genital mutilation. The subindex *Son preference* measures the economic valuation of women and is based on a 'missing women' variable that measures an extreme form of preferring boys over girls based on information about the female population that has died as a result of gender inequality. The last subindex *Ownership rights* covers the access of women to several types of property: land, credit and property other than land. The values of the SIGI and of all the subindices are between 0 and 1. The value 0 means no or very low inequality and the value 1 indicates high inequality.

The SIGI combines the five subindices into a multidimensional measure of deprivation of women in a country. The underlying methodology of construction is inspired by the Foster-Greer-Thorbecke poverty measures (Foster et al., 1984). It leads to penalization of high inequality in each dimension and allows for only partial compensation between dimensions. The value of the SIGI is calculated as follows:

$$\begin{array}{rcl} \text{SIGI} &=& \frac{1}{\mathfrak{S}} \text{ (Subindex~Example \text{Code)}}^2 + \frac{1}{\mathfrak{S}} \text{ (Subindex~Civil Literics)}^2 \\ &+& \frac{1}{\mathfrak{S}} \text{ (Subindex~Physical Integrals)}^2 + \frac{1}{\mathfrak{S}} \text{ (Subindex~Non preference)}^2 \\ &+& \frac{1}{\mathfrak{S}} \text{ (Subindex~Downishing Richts)}^2 \end{array}$$

The main shortcoming of these indices is that they cover only developing countries. This is due to the fact that the variables used as input do not measure relevant social institutions related to gender inequalities in OECD countries. Further research is required to develop appropriate measures for developed countries. Nevertheless, these social institutions indicators are innovative measures of the social, economic and political valuation of women that focus on the roots of gender inequalities and add information to other existing measures of gender inequality in well-being and empowerment.7 The ranking of countries according to the SIGI and its subindices is presented in Appendix 2 belonging to Essay 2.

We proceed as follows. First, we look for relevant theories linking - at least implicitly - social institutions related to gender inequality with development outcomes such as health,

<sup>7</sup>Examples are the Gender-Related Development Index (GDI) and the Gender Empowerment Measure (GEM) from United Nations Development Programme (1995), the Global Gender Gap Index from the World Economic Forum (Lopez-Claros and Zahidi, 2005), the Gender Equity Index developed by Social Watch (Social Watch, 2005), and the African Gender Status Index proposed by the Economic Commission for Africa (Economic Commission for Africa, 2004).

demographics, education and the governance of a society. We refer to bargaining household models (e.g. Manser and Brown, 1980; McElroy and Horney, 1981; Lundberg and Pollak, 1993) and models considering the costs and returns of children (e.g. Becker, 1981; King and Hill, 1993; Hill and King, 1995) as well as to contributions from several disciplines on governance and democracy. These contributions focus on differences in behavior between men and women, and on women's movements as a countervailing power to personal rule (e.g. Swamy et al., 2001; Tripp, 2001). Secondly, we run several linear regressions with the outcome indicators as dependent variables and the SIGI and its subindices as the main explanatory variables. Our results show that social institutions related to gender inequality matter; higher inequality in social institutions is associated with lower development outcomes.8

The rest of the paper is organized as follows. In section 3.2 we review existing theory on household decision-making and incorporate social institutions into the models, deriving hypotheses on their impact on female education, fertility and child mortality. In section 3.3, we formulate hypotheses on the impact of social institutions on rule of law, and voice and accountability based on the literature on governance, democracy and gender. Data is described in section 3.4. The empirical estimation and the results are presented in section 3.5. Section 3.6 concludes.

## **3.2 Social Institutions and Household Decisions**

In this section, we review the existing literature about the potentials effects of social institutions related to gender inequality on development outcomes. It is beyond the scope of this study to develop a formal model that incorporates social institutions and specifies the exact functional relationships. Instead, we use the non-unitary approach to the household and the Net Present Value which give hints on how social institutions operate at the household level. These approaches provide the necessary micro-foundation for the empirical analysis which can only be conducted at the macro-level because of the available data.

Non-unitary household models show that household decisions are the result of the distribution of bargaining power in the household. Common to the non-unitary models, initiated by Manser and Brown (1980) and McElroy and Horney (1981), is a game-theoretic approach to the household. Husband and wife have their own utility function, *Uh*(*ch*) for the husband and *Uw*(*cw*) for the wife, that depend each on the consumption of private goods *c*. <sup>9</sup> They bargain over the allocation of resources to maximize their utility. In the case they do not reach agreement they receive a payoff which corresponds to an individual 'threat point', *Ph*(**S**,*Z*)

<sup>8</sup>In a related paper, Jütting and Morrisson (2009) follow the same econometric procedure we use here and study the impact of the SIGI and its subindices on gender inequality on labor market outcomes.

<sup>9</sup>Certainly, there are public goods in the household that both husband and wife consume within the marriage.

and *Pw*(**S**,*Z*) which comprises the utilities associated with non-agreement.<sup>10</sup> **S** and *Z* are defined below. The implication of non-unitary models is that household members do not simply pool resources and that inequality in power may cause inequality in outcomes (Kanbur, 2003; Pollak, 2003, 2007; Lundberg and Pollak, 2008).<sup>11</sup> Empirical evidence supports this (e.g. Thomas, 1997; Schultz, 1990; Haddad and Hoddinott, 1994; Rasul, 2008).

If husband and wife have to take decisions about their sons and daughters which will affect the future then time needs to be considered. The Net Present Value (*NPV*) allows to take into account present and future costs and returns to investments. To simplify the illustration we ignore that bargaining takes place and name the decision-maker 'parents'. The maximization of utility in a multi-period model leads parents to consider the costs and returns of the investment in their children (e.g. King and Hill, 1993). This private calculation of parents at period *t* = 0 can then be represented with the *NPV* of the investment in a child, with *NPV* = ∑*<sup>T</sup> t*=0 *R*(**S**,*Z*)*t*−*K*(**S**,*Z*)*t* (1+*r*)*<sup>t</sup>* where *T* is the number of time periods considered, *R* represents the returns, *K* the costs of investments in a child, and *r* represents the discount rate. Like the threat point *P* in the non-unitary models, *R* and *K* are functions of **S** and *Z* that will be explained below. If the *NPV* is positive parents decide to invest in a child. Gender inequality in the investments in boys and girls arises if the *NPV* of boys is larger then the one of girls.<sup>12</sup>

Finally, let us explain **S** and *Z*. **S** can be defined as 'extrahousehold environmental parameters' (McElroy, 1990) or 'gender-specific environmental parameters' (Folbre, 1997) that influence the threat point in the non-unitary household models and the *NPV* of a child. We consider that **S** can be best described as *social institutions related to gender inequality*. *Z* represents all other influential factors besides **S**.

### **3.2.1 Social Institutions and Female Education**

The following examples illustrate how social institutions related to gender inequality affect the private costs and returns of educational investments.13 Social institutions related to gender

<sup>10</sup>The threat point may be external to the marriage. In this case it corresponds to the individual's utility outside the family in case of divorce, as it is modeled in the divorce threat models of Manser and Brown (1980) and McElroy and Horney (1981). In the separate spheres bargaining models of Lundberg and Pollak (1993) the threat point is internal to the marriage and is the utility associated with a non-cooperative equilibrium within marriage given by traditional gender roles and social norms, where the spouses receive benefits due to the joint consumption of public goods.

<sup>11</sup>Using Nash-Bargaining a solution to these non-unitary models can be found. Husband and wife maximize the Nash product function *N* = [*Uh*(*c<sup>h</sup>* − *Ph*(**S**,*Z*)][*Uw*(*c<sup>w</sup>* − *Pw*(**S**,*Z*)], that is subject to a pooled budget constraint. The result is the demand function *c<sup>i</sup>* = *f <sup>i</sup>* (*p*,*y*,**S**,*Z*) with *p* for prices, *y* for total household income and *i* = *w*,*h* (Lundberg and Pollak, 2008).

<sup>12</sup>See Pasqua (2005) who considers both perspectives, the non-unitary approach to the household and the cost and returns approach in the case of education of girls.

<sup>13</sup>It must be noted that the private *NPV* of investments in the education of children does not correspond to the social *NPV*. Social returns to education, especially female education, are often higher than the private ones.

inequality influence the costs of education as they shape a gendered division of labor and the opportunity costs of educating girls. Opportunity costs include income from child labor and are higher for girls when they are expected to do housework, to care for their younger siblings or to work in agriculture (Hill and King, 1995; Lahiri and Self, 2007). Social institutions related to gender inequality also affect the returns to education. The returns are generally lower for girls than for boys because girls and women are discriminated on the labor market in the form of entry restrictions and wage gaps. Thus, boys are expected to be economically more productive. Furthermore, parents often expect only low returns from female education because the daughter marries and leaves the house implying that the family loses her labor force. As a consequence sons become the building block of their parents' old-age security (Hill and King, 1995; Pasqua, 2005; Song et al., 2006).14

The costs and returns perspective does not rule out that the distribution of decisionmaking power in the household matters. The non-unitary household approach can be used to explain low female education (Pasqua, 2005). Several empirical studies show that when women dispose of more resources, investments in the education of girls are higher (e.g. Schultz, 2004; Emerson and Souza, 2007).

• *Hypothesis 1: Social institutions that deprive women of their autonomy and bargaining power in the household or that increase the private costs and reduce the private returns to investments into female education are associated with lower female education than in a more egalitarian environment.*

### **3.2.2 Social Institutions and Fertility and Child Mortality Rates**

Social institutions related to gender inequality that influence female decision-making power in the household and the *NPV* of the investment in girls in comparison to boys are also relevant for fertility levels and child mortality.

Concerning fertility, one can use the non-unitary household approach and argue that the net utility of a woman associated with getting a child might differ from that of a man. If one assumes that man and woman derive the same satisfaction of having a child, the net utility a woman derives is lower than the one of the man as she bears most of the costs of having children. These costs are related to the discomfort and health risks related to pregnancy, and

There is evidence that society benefits from female education as it contributes to overall development and drives economic growth (Hill and King, 1995; Klasen, 2002; Braunstein, 2007; Klasen and Lamanna, 2009). The resulting investment in female education will then often be sub-optimal.

<sup>14</sup>In addition to all of these considerations, social institutions related to gender inequality might affect the supply of schooling which might influence the decision to send girls to school if school environments are hostile to the needs of girls (e.g. no female teachers available, long distances to school or prices in favor of boys) (Hill and King, 1995; Alderman et al., 1996; Pasqua, 2005; Lahiri and Self, 2007).

the income losses associated with time spent on child care. This might explain why women want less children than men, but cannot achieve their objectives as social institutions restrict their power in limiting the number of children born. Empirical studies support the hypothesis that reduced female bargaining power leads to shorter time spans between births, a lower use of contraceptives and higher fertility levels (Abadian, 1996; Hindin, 2000; Saleem and Bobak, 2005; Seebens, 2008).

The perspective of the *NPV* provides a second explanation for higher fertility. In the absence of well-functioning insurance markets and pension systems, parents in developing countries may need more children to feel secure. Depending on the costs of a child and the returns to the investment in a child parents will consider to get more children. As it was explained in the previous subsection on female education, social institutions related to gender inequality affect the *NPV* of investments in children. If these social institutions lower income earning opportunities for girls, the *NPV* of investments in girls will be lower than the *NPV* of investments in boys. Hence, sons yield the promise of more economic security as compared to daughters. As long as parents cannot perfectly control the sex of their offspring, they will bear more children to increase the chance of having more sons (Abadian, 1996; Kazianga and Klonner, 2009).

To explain higher child mortality levels with social institutions that disadvantage women one has to bear in mind that mothers are usually the primary caregivers of children. Within the non-unitary framework, if mothers have only limited power in the household, they are constrained in the use of health care or in the access to food and other goods necessary for children. Thus, they cannot take care of their children as they would without those restrictions. This might lead to worse child health and higher child mortality rates (Thomas, 1997; Bloom et al., 2001; Smith et al., 2002; Maitra, 2004).

From the *NPV* perspective it might be rational for parents to invest more in the health and nutrition of boys than in girls who as a consequence could suffer more heavily from health problems and experience higher mortality rates than boys. It is possible that this behavior increases overall child mortality rates. In addition, the limited education that women typically receive in patriarchal societies as a result of past *NPV* calculations of their parents might also lead to worse child health and to higher child mortality figures (Schultz, 2002; Shroff et al., 2009).

• *Hypothesis 2: Social institutions that deprive women of their autonomy and bargaining power in the household or that increase the private costs and reduce the private returns of investments into girls are associated with higher fertility levels than in an egalitarian environment.*

• *Hypothesis 3: Social institutions that deprive women of their autonomy and bargaining power in the household or that increase the private costs and reduce the private returns of investments into girls are associated with higher child mortality than in an egalitarian environment.*

## **3.3 Social Institutions and the Society: Governance**

In societies where social institutions limit the rights of women, and where women's place is restricted to the private sphere, they have no or less say in the public and political domain. What is the impact of social institutions related to gender inequality on governance? We use Kaufmann et al. (2008, p. 7)'s definition of governance "as the traditions and institutions by which authority in a country is exercised. This includes the process by which governments are selected, monitored and replaced; the capacity of the government to effectively formulate and implement sound policies; and the respect of citizens and the state for the institutions that govern economic and social interactions among them."

There are at least two approaches that allow to link social institutions with governance. First, there exist psychological and sociological explanations that state that women are less egoistic than men. Women are more risk-averse, they tend to follow the rules and they are more community-oriented than men (Dollar et al., 2001; Swamy et al., 2001). Countries in which women have more power will have a political system that is more rule oriented, responsive and accountable. Second, women's movements, being the answer to the exclusion of women from power, play an important role in increasing the quality of political systems by challenging e.g. personal rule (Waylen, 1993; Tripp, 2001). This argumentation suggests that countries with social institutions that hinder women to organize and to express their interests might lack an important oppositional force and therefore have a bad quality of governance.


## **3.4 Data**

Our investigation uses macro-data at the country level. Table 5.29 in Appendix 3 gives an overview over the variables used for our estimations, the definitions and the data sources. Descriptive statistics of the variables used are presented in Table 5.26 in Appendix 3. As main regressors we use the SIGI and its five subindices *Family code*, *Civil liberties*, *Physical integrity*, *Son Preference* and *Ownership rights* in our estimations to check their explanatory value for the development outcomes female education, fertility, child mortality and governance.

First, we are interested in the impact of social institutions on female education, fertility and child mortality. As dependent variables we use *total fertility rates* from World Bank (2009) and *child mortality rates* from World Bank (2008). To measure education we choose *female gross secondary school enrollment rates* because this enables important functionings and empowers women. Furthermore, we assume that parents take into account that basic education of both boys and girls is necessary for fulfilling tasks related to the household. Data for secondary school enrollment are from World Bank (2009).

Second, we want to estimate the association between governance and our social institutions measures. We use the Governance Indicators developed by Kaufmann et al. (2008) and choose two of them to capture equality before the law, justice, tolerance and security as well as responsiveness, political openness and accountability in the political system. The *rule of law* index measures the extent to which contracts are enforced and property rights are ensured and the extent to which people trust in the state and respect the rules of the society. The *voice and accountability* index proxies civil and political liberties like freedom of expression, freedom of association, free media and the extent of active and passive political participation of citizens.

In all regressions we control for the level of economic development, religion, region and the political system in a country. The specific variables we use are:


Asia, *ECA* for Europe and Central Asia, *LAC* for Latin America and Caribbean, *EAP* for East Asia and Pacific);

• two political institutions variables, the *electoral democracy* variable and the *civil liberties index* from Freedom House (2008) that together measure liberal democracy which is assumed to be related to responsiveness to the needs of the public, political openness and tolerance in a country.<sup>15</sup>

We use different additional control variables in each regression following suggestions in the literature. In the fertility and child mortality regressions, we additionally control for


The Governance regressions exclude as control variables the civil liberties index from Freedom House as this index is used to build the voice and accountability index that we choose as dependent variable. We keep the electoral democracy variable because it does not pose a problem. We additionally include as control variables


Social institutions, i.e. normative frameworks, change only slowly and incrementally. As the social institutions indicators are not expected to change much over time we have to decide which year or time span should be covered by the other variables. For our response variables

<sup>15</sup>We multiply the civil liberties index by -1 to facilitate interpretation.

we choose to take the average of the existing values over five or six years (2000-2005, 2001- 2005). For the control variables we take the averages of the existing values over ten years (1996-2005).16 The averages provide information that is more stable than using a particular year. Using a longer time span for the control variables than for the response variables allows to capture possible time delays until effects can be observed. Nevertheless, we acknowledge that the choice of the time spans is arbitrary.

## **3.5 Empirical Estimation and Results**

### **3.5.1 Empirical Estimation**

We empirically test with linear regressions whether the composite measures reflecting social institutions related to gender inequality *si* are associated with each of the response variables *yi*, representing the chosen development outcomes. We estimate regressions in the form

$$\mathbf{y}\_{i} = \mathbf{c} + \mathbf{g}\mathbf{s}\_{i} + \text{control variables}\_{i} + \mathbf{s}\_{i} \tag{3.1}$$

using information at the country level. We are mainly interested in testing the null hypothesis that the coefficient β is zero at a statistical significance level of α = 5%. If the null hypothesis is rejected, it is reasonable to infer that the measure proxying social institutions related to gender inequality does matter for the given response variable, as predicted in the hypotheses from sections 3.2 and 3.3.

The general procedure used for each of the response variables consists of two steps. First, we start examining the effect of SIGI. We begin our estimation with a simple linear regression with SIGI as the only regressor *si*. We then run a multiple linear regression adding the main group of control variables that consists of the level of economic development, region dummies, religion dummies and the political system variables. If SIGI is significant in this regression, we continue and, if applicable, estimate the complete model with all identified control variables to confirm whether SIGI remains significant.

As SIGI is a rather broad measure to rank and compare countries and policy implications are difficult to derive from it, in a second step we focus on the subindices to get a more precise idea about what kind of social institutions might be related to the chosen development outcomes. We estimate the same multiple linear regression(s) described above using the five subindices *si* one at a time instead of SIGI to explore which dimension of social institutions related to gender inequality seems to be the most relevant. In the corresponding regression

<sup>16</sup>The ethnic fractionalization variable is constant over time as changes in the ethnic composition of a country at least over 20 and 30 years are rare (Alesina et al., 2003).

tables we only report the specification with the subindex or subindices that are statistically significant. It must be noted that we keep and show even those control variables that are not statistically significant in the regression, as we want to stress that the social institutions indices are associated with the development outcomes even if we include these control variables.

All regressions are estimated with Ordinary Least Squares (OLS). Regression diagnostics not reported here suggest that heteroscedasticity is a possible issue in our data and that there are influential observations that could drive our results. Concerning the first issue, it is known that if the model is well specified, the OLS estimator of the regression parameters remains unbiased in the presence of heteroscedasticity, but the estimator of the covariance matrix of the parameter estimates can be biased and inconsistent making inference about the estimated regression parameters problematic. Violations of homoscedasticity can lead to hypothesis tests that are not valid and confidence intervals that are either too narrow or too wide. To deal with heteroscedasticity, we use 'heteroscedasticity-consistent' (HC) standard errors. This means that while the parameters are still estimated with OLS, alternative methods of estimating the standard errors that do not assume homoscedasticity are applied. As the samples we use contain less than 150 observations, we use HC3 robust standard errors proposed by Davidson and MacKinnon (1993), which are better in the case of small samples. These are the standard errors that are presented in the regression Tables 3.1-3.5. Simulation studies by Long and Ervin (2000) have shown that HC standard error estimates tend to maintain test size closer to the nominal alpha level in the presence of heteroscedasticity than OLS standard error estimates that assume homoscedasticity. These authors recommend the use of HC3 robust standard errors, especially for sample sizes less than 250, as they can keep the test size at the nominal level regardless of the presence or absence of heteroscedasticity, with only a minor loss of power associated when the errors are indeed homoscedastic.<sup>17</sup>

In addition to this, we also use bootstrap with 1000 replications to compute a Biascorrected and accelerated (BCa) 95% confidence interval of the regression coefficients computed with OLS (Efron and Tibshirani, 1993). One of the main advantages of bootstrapping methods is that no assumptions about the sampling distribution or about the statistic are needed. The results are not reported here, but are available upon request, and confirm that all the coefficients that are significant at the 5% level in Tables 3.1-3.5 remain significant when using Bca 95% confidence intervals around them.

<sup>17</sup>Certainly, heteroscedasticity-consistent standard errors are not a panacea for inferential problems under heteroscedasticity. As pointed out by some authors, there are limitations and trade-offs in these estimators (e.g. Kauermann and Carroll, 2001; Wilcox, 2001).

#### 3.5. EMPIRICAL ESTIMATION AND RESULTS 65

To deal with the second issue and check whether influential observations drive the results, we take the estimates of a regression obtained with OLS with standard variance estimator to detect the observations with unusual influence or leverage based on Cook's distance. Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression. We exclude countries from the sample if the value of Cook's distance is larger than 4/*n*, with *n* being the number of observations, and re-estimate each regression on the restricted sample with HC3 robust standard errors. In all the cases we confirm that even after we exclude influential observations, the results remain basically unchanged.<sup>18</sup> The regressions are not reported here, but are available upon request.

We consider that the model specification is reasonable. However, possible endogeneity of our main regressors *si* (the SIGI and its subindices) should be taken into account when interpreting the coefficients of *si* as they would be biased and inconsistent in this case. Endogeneity is given if *si* is correlated with the disturbance ε*<sup>i</sup>* in equation 3.1. There are three sources of endogeneity: omitted variables, measurement error and simultaneity (Wooldridge, 2002). We have included control variables to minimize omitted variable bias, although it is impossible to completely rule out this problem. Concerning measurement error, we regard the SIGI and the subindices as adequate proxies of social institutions related to gender inequality. It is not very plausible that there are errors in measurement that are related to the unobserved social institutions. The last source, simultaneity, arises when *si* is determined simultaneously with *yi*. We consider that social institutions related to gender inequality *si* are relatively stable and long-lasting. Therefore, we think it is unlikely that the response variables *yi* influence *si*. 19

### **3.5.2 Results**

Before we run the regressions it is necessary to check first the correlation between the subindices to rule out redundancy, and secondly between the subindices and the control variables to check whether the social institutions indices are proxies for these control variables. The Pearson correlation coefficient between the subindices is always positive, but not always significant. The correlation coefficients are always lower than 0.6, with the exception of the

<sup>18</sup>As an alternative procedure we use robust regression with iteratively reweighted least squares as described in Hamilton (1992), and confirm that results are similar.

<sup>19</sup>Social institutions are hard to measure. Therefore, sometimes one has to rely on legal indicators to proxy them, although we acknowledge that this could pose problems as there is for example an international mechanism, the Convention on the Elimination of All Forms of Discrimination against Women (CEDAW), that aims at changing social institutions through legal measures. However, the impact of CEDAW on national legislation depends on the willingness of governments to sign and ratify it without reservation and on its willingness and ability to enact the new laws. Given the constituting function of social institutions for a society this could be difficult and depends on many factors.

correlation between the subindices *Family Code* and *Ownership rights*, which is equal to 0.74 (Table 5.27).<sup>20</sup> Table 5.28 shows that the absolute value of the Pearson correlation coefficient between the social institutions indicators and the control variables is always lower than 0.6, except for the SIGI and the subindices *Family code* and *Ownership rights* and the two variables capturing literacy of the whole population and of the female population.



\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001

HC3 robust standard error in brackets.

Regression (2) and (3) with controls for economic development, geography, religion and political system. In this case, this specification corresponds to the complete specification.

Regression results using *female secondary education* as dependent variable are presented in Table 3.1. Regression (1) with SIGI as the only regressor yields a negative and statistically significant association. Higher levels of inequality are associated with lower levels of female secondary education. The association vanishes in regression (2) if one includes the level of economic development, religion, region and the political system as control variables. Using

<sup>20</sup>Table 2.2 of Essay 2 shows Kendall Tau b between the five subindices and confirms that they are positively correlated, albeit not perfectly.

the subindex *Family code* instead of SIGI as the main regressor in regression (3) shows a different picture. The subindex is statistically significant even if the control variables are included. The adjusted coefficient of determination *R*<sup>2</sup> is 0.78. Hence, we find no evidence against Hypothesis 1 that states that social institutions related to high gender inequality are negatively associated with female education.21


Table 3.2: Linear Regressions with Dependent Variable Fertility

\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001

HC3 robust standard error in brackets.

Regression (2) and (3) with minimum of controls for economic development, geography, religion and political system. Regression (4) with complete specification for fertility.

Results obtained using *total fertility rate* and *child mortality* as response variables are shown in Tables 3.2 and 3.3. In both cases, the simple linear regression (1) using SIGI as the

<sup>21</sup>Regressions not reported here, but available upon request, using primary gross completion rates obtained from World Bank (2008) instead of female secondary schooling as the dependent variable yield similar results.

only regressor shows a positive and significant statistical association between SIGI and the response variable. Higher levels of inequality are associated with higher levels of fertility and with higher levels of child mortality. However, once control variables related to the level of economic development, religion, region and the political system in a country are included in regression (2), SIGI is not longer statistically significant. This is not the case when we use the subindex *Family code* as the main regressor, as it is significant in regression (3) which uses the same control variables, and even in regression (4) which adds two additional regressors: the share of literate adult female population and a dummy reflecting high adult HIV/AIDS prevalence. In regression (4) the obtained adjusted *R*<sup>2</sup> is 0.84 for fertility and 0.82 for child mortality. Hence, we cannot reject Hypotheses 2 and 3, suggesting that social institutions related to high gender inequality are associated with higher fertility levels and higher child mortality.<sup>22</sup> As the subindex *Family code* is the relevant social institutions measure in our empirical estimations it seems that social institutions that deprive women of their autonomy and bargaining power in the family and that might restrict women's possibilities outside the family do matter for female education, fertility and child mortality.

Table 3.4 shows the results obtained for the dependent variable *voice and accountability*. Regression (1) with SIGI as the only regressor shows a negative and statistically significant association: higher levels of gender inequality are associated with lower levels of voice and accountability. This association remains significant in regression (2) where we add the level of economic development, religion, region and the political system<sup>23</sup> as control variables, and in the complete specification shown in regression (3) where we additionally include the proportion of seats held by women in national parliaments, the literacy rate of the population, a measure of openness of the economy, and a measure of ethnic fractionalization. In regression (3), we obtain an adjusted *R*<sup>2</sup> of 0.69. We explore which dimension of social institutions related to gender inequality is behind this result and find that it is the subindex *Civil liberties*. The specifications with the subindex *Civil liberties* in regressions (4) and (5) show that this subindex is negatively associated with voice and accountability and that this association is statistically significant even with the control variables. In regression (5) the adjusted *R*<sup>2</sup> is 0.69. Hypothesis 4 cannot be rejected with this evidence suggesting that social institutions related to gender inequality inhibit the building blocks of good governance in the form of voice and accountability. The subindex *Civil liberties* is the relevant social institutions measure in our empirical estimations. The freedom of women to participate in public life seems

<sup>22</sup>Regressions not shown here, but available upon request, confirm the results concerning mortality rates when infant mortality rates taken from World Bank (2008) are used instead of child mortality rates.

<sup>23</sup>Recall that in the governance regressions we only include the electoral democracy variable of Freedom House (2008) as the civil liberties index is included in the chosen governance indicators which are now the response variables.

to increase the quality of governance of a society. Relating back to theory, this could be due to the behavior of women as they tend to be more socially oriented than men and are a group that cross-cuts cleavages in general.


Table 3.3: Linear Regressions with Dependent Variable Child Mortality

\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001

HC3 robust standard error in brackets.

Regression (2) and (3) with controls for economic development, geography, religion and political system. Regression (4) with complete specification for child mortality.


Table 3.4: Linear Regressions with Dependent Variable Voice and Accountability

Regression (2) and (4) with controls for economic development,

Regressions (3) and (5) with complete specification for

 error

 geography, religion and political system.

governance/voice

 and accountability.


Table 3.5: Linear Regressions with Dependent Variable Rule of

 Law

Regression (2), (4) and (6) with controls for economic development,

Regressions (3), (5) and (7) with complete specification for

 error

 geography, religion and political system.

governance/rule

 of law.

Results for the other component of governance, *rule of law*, are shown in Table 3.5, providing evidence for Hypothesis 5. Regression (1) shows a negative and statistically significant association between SIGI and rule of law: higher levels of inequality are associated with lower levels of rule of law. This association remains significant in regression (2) where we add the level of economic development, religion, region and the political system as control variables, and in the complete specification in regression (3) where we additionally include the proportion of seats held by women in national parliaments, the literacy rate of the population, a measure of openness of the economy, and a measure of ethnic fractionalization. In this last regression, we obtain an adjusted *R*<sup>2</sup> of 0.51. Again, we are interested in exploring which dimension of social institutions related to gender inequality is the relevant one for rule of law finding that two subindices matter: *Ownership rights* and *Civil liberties*. <sup>24</sup> The specifications with the subindices yield similar results to those of the SIGI and are presented in regressions (4) and (5) for *Ownership rights* and (6) and (7) for *Civil liberties*. For both subindices the adjusted *R*<sup>2</sup> obtained for the complete specification is 0.56. As postulated in Hypothesis 5, social institutions related to gender inequality seem to matter for governance inhibiting the rule of law, e.g. through personal rule and inequality in justice. Assuming that women's attitudes are different from those of men and that they challenge injustice, women's power in a society contributes to improve rule of law. The two subindices proxy where this power comes from, with *Ownership rights* measuring economic power through access to property and *Civil liberties* measuring the freedom to participate in and to shape public life.

A reasonable question is whether the social institutions indicators are capturing different religions. In the regressions reported here, we control for religion using a Christian and a Muslim dummy. As the results show, at least one subindex is significant when we control for religion. One could argue that what matters is how religion is practiced in the considered regions, and that the SIGI and the subindices might capture regional practice of religion. Therefore, we re-estimate all regressions including interactions between the religion and region dummies. The results for the SIGI and the subindices remain unchanged suggesting that they capture something different than religion and the regional practice of it.25

## **3.6 Conclusion**

This study presents several answers to the question why we should care about social institutions related to gender inequality beyond the intrinsic value of gender equality. We derive hypotheses from existing theories and empirically test them with linear regression at the

<sup>24</sup>As shown in Table 5.27 the Pearson Correlation coefficient between the subindices Ownership rights and Civil liberties is 0.36.

<sup>25</sup>The results are available upon request.

#### 3.6. CONCLUSION 73

cross-country level using the newly created Social Institutions and Gender Index (SIGI) and its subindices. Our results show that social institutions related to gender inequality are associated with lower female secondary education, higher fertility rates, higher child mortality and lower levels of governance measured as voice and accountability and rule of law. We find that apart from geography, political system, the level of economic development and religion, one has to consider social institutions related to gender inequality to better account for differences in important development outcomes.

The empirical estimation follows a two-step procedure for each outcome measure. First, the focus is to examine the explanatory value of the SIGI. In the specifications including all control variables, the SIGI is significant in the regressions for the measures of governance like voice and accountability and rule of law. If one interprets the SIGI as a summary measure of lack of power of women in all spheres of society then it seems that when women have more power, governance is better.26 In the case of female secondary schooling, fertility rate and child mortality the SIGI turns out to be insignificant in the complete specifications.

Secondly, as the SIGI is a broad measure of social institutions related to gender inequality, we investigate which particular dimension of social institutions is significantly related to the chosen development outcomes, using the complete specifications. The subindex *Family code* is negatively associated with female education and positively with fertility and child mortality. These results suggest that social institutions that deprive women of their autonomy and bargaining power in the family do matter for female education, fertility and child mortality. The subindex *Civil liberties* is the dimension of social institutions that is significantly related to the governance component voice and accountability. The freedom of women to participate in public life seems to increase the quality of governance of a society as women tend to be more socially oriented than men and are a group that cross-cuts cleavages in general. The rule of law component of governance is negatively related to the subindices *Civil liberties* and *Ownership rights*. The two subindices proxy where this power comes from, with *Ownership rights* measuring access to property and *Civil liberties* measuring the freedom to participate in public life. Assuming that women's attitudes are different from those of men and that they challenge personal rule, women's power in a society is a relevant factor in increasing the rule of law.

Although the subindices *Family code*, *Ownership rights* and *Civil liberties* are the relevant dimensions of social institutions related to gender inequality for the response variables considered in this study, this does not mean that the other two subindices *Son preference* and *Physical integrity* are not important intrinsically or instrumentally for other outcomes.

<sup>26</sup>The association between two composite measures like the SIGI and the governance indicators has to be interpreted carefully.

Case studies investigating the mechanisms between social institutions and the outcome variables are necessary. Our study has the limitations of any cross-sectional regression analysis as we cannot rule out omitted variable bias. Causality can never be derived from regression analysis with cross-sectional data unless valid instruments are found. Concerning the results of the subindices, these should be considered exploratory and need to be confirmed with further research, which should also include the elaboration of appropriate theories linking social institutions related to gender inequality with each of the development outcomes used in this study.

Social institutions are long-lasting and deep-seated in people's minds. Changing them is a difficult task and requires approaches tailored to the particular needs and the socio-economic context (Jütting and Morrisson, 2005). The state can certainly help attenuate the effects of social institutions through specific policies. It may set incentives to counteract social institutions, e.g. in the form of laws to fight against discriminatory practices or through the implementation of programs favoring girls and women. Micro-credit programs or subsidies targeted at mothers are good examples here. Nevertheless, changing social institutions needs more than that. It needs a thorough understanding of the power relations in a country and people that are willing to become reform drivers and initiate learning processes that should be complemented by deliberation and public discussion at all levels of society. Be it through internal or external forces, women need help to empower themselves. That is what Sen calls 'agency of women' (Sen, 1999b).

## **Chapter 4**

# **Reexamining the Link Between Gender and Corruption: The Role of Social Institutions**<sup>1</sup>

## **4.1 Introduction**

Is there a link between gender inequality and corruption in a society? The studies of Swamy et al. (2001) and Dollar et al. (2001) suggest that countries with greater representation of women in political and economic life tend to have lower levels of corruption. How can this relationship be explained?

This could be attributed to behavioral differences between men and women. As mentioned by Dollar et al. (2001), there are experimental studies and studies using survey data that find that, on average, women are less selfish and might have higher moral and ethical standards than men (e.g. Eagly and Crowley, 1986; Glover et al., 1997; Eckel and Grossman, 1998; Rivas, 2008).2 If one accepts that women are less selfish and align their actions on higher moral standards than men, having women in important political and economic positions might lead to less corruption in a country.

An alternative explanation is put forward by Swamy et al. (2001), who argue that the negative relationship between women's participation and corruption could be due to self-selection. Only a few women reach powerful positions, and these women possibly gain access to these positions as they are from the "better" part of the women's distribution. >From a historical perspective, Goetz (2007) claims that it is gendered access to political positions that explains

<sup>1</sup>joint work with Boris Branisa

<sup>2</sup>There are empirical studies that challenge the finding that women are the "fairer sex" (e.g. Andreoni and Vesterlund, 2001; Alhassan-Alolo, 2007; Alatas et al., 2009). Another investigation highlights that when women are in a powerful position, they take decisions that are closely related to women's needs (Chattopadhyay and Duflo, 2004).

why women seem to be less corrupt than men. Excluded from male patronage networks, women are restricted in their opportunities for corrupt behavior. As they are newcomers or only few in the political or business sphere, women lack familiarity with the rules of illicit exchange to their own benefit. They try to assert their position by acting honestly and trustworthily. This all leads to fewer corrupt activities by women, but as time passes and more women get access to power this effect might vanish.

It can also be argued that the observed relationship between women's representation and corruption is spurious. Swamy et al. (2001) and Dollar et al. (2001) warn that even if one controls for other factors in the regression, the observed relationship at the cross-country level could be due to some unobserved variable which influences both female representation and corruption. For example, according to Sung (2003) it might be the political system in the form of liberal democratic institutions that influences both. Sung (2003) argues that institutions of *liberal democracy* increase women's participation in government through values like equality, pluralism, fairness and tolerance. Competitive elections, an independent judiciary and a free press, which are elementary to a liberal democratic system, guarantee transparency and hold government officials accountable, thereby reducing corruption. Therefore, the negative effect of women's representation in government on corruption is spurious and vanishes when one includes a measure of democracy in the regression, which is empirically confirmed by Sung (2003). Swamy et al. (2001) draw attention to the "level of discrimination against women" as another possible omitted variable that drives both female participation and corruption. They claim that in countries that are more corrupt there is more discrimination against women and argue that in countries where traditions and clientelism prevail, there is a preference for men in power.

In this paper, we focus on the effect of discrimination against women on corruption in a society as we have a new measure of society's attitude towards gender inequality to empirically test this relationship. Swamy et al. (2001) do not explain how this relationship operates, but several studies deal with this issue in a direct or indirect way (Tripp, 2001; Inglehart et al., 2002; Rizzo et al., 2007). The authors of these studies claim that society's attitude towards women influences how a political system functions and that it affects the positions women take in this system. Assuming that the level of corruption depends on the functioning of the political system, one could argue that society's attitude towards gender inequality has an impact on corruption.

The study of Tripp (2001) focuses on women's movements as a countervailing force to prevailing practices of corruption in Eastern and Southern Africa.<sup>3</sup> Political reforms at the

<sup>3</sup>Waylen (1993) makes a similar point for Latin America.

beginning of the 1990s, including free and competitive elections, a multi-party system and freedom of expression and association were not enough to give women access to powerful positions and to curtail the practices of patronage and clientelism. Women could enter the system, but they were excluded from male-dominated networks and therefore from the benefits of clientelism. However, political reforms allowed the formation of social forces. The disadvantaged women organized in autonomous movements, which were broad-based, multi-ethnic and multi-religious. These movements crosscut cleavages and started to demand transparency and the removal of clientelistic networks.

A similar perspective is adopted by Inglehart et al. (2002) and Rizzo et al. (2007) who state that when a society favors gender equality, there is more tolerance in general, more personal freedom and individual autonomy. The absence of these values inhibits political reforms towards a democratic system. The study of Inglehart et al. (2002) finds that gender equality is the most important part of "self-expression values" appearing in post-industrial societies which directly contribute to both democratization and to a greater representation of women in politics. Focusing on Arab and non-Arab Muslim countries, Rizzo et al. (2007) shows that even if democratic political institutions like elections, political parties or checks and balances are put in place, gender inequality can prevent these institutions from functioning well.

We empirically test on a sample of developing countries the relationship between social institutions related to gender inequality and the level of corruption, and contribute to the literature discussed above. We focus on public corruption, which refers to the misuse of public office for private gain. It comprises grand corruption, which refers to activities of top officials and big companies, and petty corruption, which refers to the activities of people at the lower end of hierarchies (Pardo, 2004). To proxy society's attitude towards gender inequality or what Swamy et al. (2001) call "level of discrimination against women" we introduce social institutions related to gender inequality into the analysis. These are long-lasting norms, traditions and codes of conduct that shape gender roles and influence the opportunities of women and men in a society. As suggested by e.g. De Soysa and Jütting (2007) and Essay 3, these guiding principles of human behavior affect development outcomes and should not be neglected in the study of a society. We measure social institutions related to gender inequality with the subindex Civil liberties proposed in Essay 2, which is based on variables from the OECD Gender, Institutions and Development Database (Jütting et al., 2008). This subindex captures society's attitude with regard to gender roles based on the freedom of women to participate in social life.

Our aim is to investigate whether society's attitude towards gender inequality matters for corruption once one takes into account the representation of women in parliament and

business as well as the political system of a country. The hypothesis is that in a society where women's participation in social life is restricted, there is a higher level of corruption.

Even after controlling for democracy and political and economic participation of women, as well as for other factors, we find a robust and significant relationship between the subindex Civil liberties and the level of corruption. We show that social institutions related to gender inequality are an important factor for the study of corruption. In societies where women are deprived of their freedoms to participate in social life, corruption is higher. As should be clear from the various existing theories the exact causal mechanism behind this relationship is not obvious and it cannot be established in this study since we conduct a cross-sectional analysis. This implies that one needs to carefully investigate the context, as tackling corruption might require more than pushing democratic reforms and increasing female representation in political and economic positions. The rest of the paper is organized as follows. Section 4.2 describes the data used, the empirical estimation and the main results, which are discussed in Section 4.3.

## **4.2 Empirical Estimation and Results**

### **4.2.1 Data**

The definition of all variables and descriptive statistics are presented in Tables 5.29, 5.30 and 5.31 in Appendix 4. Measuring corruption is a complex task as it has many faces. There is public corruption, which refers to the misuse of public office for private gain, and corruption that comprises the collusion between firms or misuse of corporate assets (Svensson, 2005). Other authors differentiate between grand and petty corruption. Grand corruption refers to activities of top-officials and big companies. Petty corruption refers to the activities of people at the lower end of hierarchies (Pardo, 2004).

We use two different measures of public corruption in our estimations comprising grand and petty corruption. The first measure is the Corruption Perception Index (*CPI*) of Transparency International.4 The CPI measures the perception of corruption in a country. It is based on various data sources, business surveys and expert panels about perceptions of corruption, and is a comprehensive measure that covers the different forms of grand and petty corruption in business, politics and administration. It is continuous and ranges from 0 meaning high corruption to 10 meaning low corruption (Lambsdorff, 2006).

The second indicator is the Corruption in Government Index from the International Country Risk Guide (*ICRG*) provided by the Political Risk Services.5 The ICRG index assesses

<sup>4</sup>Data are available at http://www.transparency.org/policy\_research/surveys\_indices/cpi.

<sup>5</sup>Data are available at http://www.prsgroup.com/.

the political risk associated with corruption and focuses in particular on those types of corruption that lead to instability in the political system as they distort the economic and financial environment, put foreign investments into risk and reduce the efficiency of government and business because people come to power not because of their ability but through patronage and clientelistic practices.<sup>6</sup> Hence, this measure gives the extent of political risk of instability that is assumed to increase with corruption. Therefore, it is only under certain conditions an indicator of the level of corruption. Whether the political risk of instability caused by corruption coincides with the level of corruption depends on the degree of tolerance towards corruption (Lambsdorff, 2006). The ICRG corruption index goes from 0 to 6 with 0 meaning high risk and 6 indicating low risk. Pearson correlation coefficient between both corruption measures is significant and is 0.58 indicating that both measures seem to capture different aspects of corruption.

The subindex Civil liberties (*Subindex Civil lib.*) is one of five composite indices (the others being subindex Family code, subindex Son preference, subindex Physical integrity, subindex Ownership rights) that measure social institutions related to gender inequality (see Essay 2). These social institutions are conceived as long-lasting norms, traditions and codesof conduct that find expression in traditions, customs and cultural practices, informal and formal laws and guide people's behavior and interaction. They shape gender roles and therefore the social and economic opportunities of men and women. We use the subindex Civil liberties in this study as it covers those social institutions that directly shape the opportunities of women to participate in social life. Hence, it reflects better their opportunities to gain power in politics and economics than the other subindices related to gender inequality. Indeed, we find that the subindex Civil liberties is the only subindex that is significant in the regression analysis. It is built out of two variables of the OECD Gender, Institutions and Development Database (Morrisson and Jütting, 2005; Jütting et al., 2008), which are freedom of movement and freedom of dress. The variables measure whether women are allowed to go outside the house and whether they are obliged to use a veil or burqa to cover parts of their body in public. Both variables are ordinal taking the values 0, 0.5 and 1 with 0 indicating no restrictions and 1 indicating high restrictions on women.<sup>7</sup> They are proxies of civil liberties in a sense that when women are restrained to leave the house it is difficult to imagine

<sup>6</sup>http://www.prsgroup.com/ICRG\_Methodology.aspx#PolRiskRating

<sup>7</sup>The variable dress code takes the value 0 if there are less than 50% of women that are obliged to follow a certain dress code, 0.5 if there are more than 50% of women forced to follow a certain dress code and 1 if all women are obliged to follow a certain dress code, or if it is punishable by law not to follow it. The variable freedom of movement is 0 if there are no restrictions of women's movement outside the home, 0.5 if (some) women can leave home sometimes, but with restrictions, and 1 if women can never leave home without restrictions (i.e. they need a male companion, etc.)

that they can actively participate in social, political and economic life. Wearing a veil might be a form of self-determination and expression, and different traditions, styles and customs are connected to it. However, *forced* veiling is incompatible with agency, as it might be a sign of subordination in a society and might hinder interactions with other human beings either as women cannot interact because they wear a veil or they can only interact if they wear a veil (Macdonald, 2006; Milallos, 2007). The subindex is the rescaled weighted sum of the two variables with the weights obtained from polychoric principal component analysis (Kolenikov and Angeles, 2009). The subindex goes from 0 (no gender inequality) to 1 (high gender inequality). As the subindex Civil liberties does not cover developed (OECD) countries, the subsequent empirical analysis focuses on developing countries. The list of countries covered by the subindex Civil liberties can be found in Table 5.32 in Appendix 4.

Table 4.1: Variation of the Subindex Civil Liberties Over Religion


The variables that are contained in the subindex could be considered as proxies for religion and therefore one could think that the subindex Civil liberties might be a proxy for religion as well. When investigating the variation of the subindex over religion, one observes that there is more variation within Muslim majority countries than in countries with either Christian majority or countries without Christian or Muslim majority (Table 4.1).8 To further examine whether the subindex measures Muslim religion, we plot the subindex Civil liberties against the percentage of Muslim population in a country (Figure 4.1). It is true that countries having less than 50% Muslim population tend to have lower values on the subindex Civil liberties with the exception of India which scores 0.6 with about 15% of Muslim population. For countries with more than 50% Muslim population the subindex shows more variation. Noticeably, there are several countries that have more than 70% of Muslim population and

<sup>8</sup>The variable freedom of movement varies over all three religious categories, while the variable freedom of dress has almost no variation in countries having a Christian majority or countries without Christian or Muslim majority, except for India and Sri Lanka.

Figure 4.1: Scatter Plot: Subindex Civil Liberties Against Percentage of Muslim Population

the value 0 on the subindex Civil liberties.9 Consequently, there is no perfect correspondence between the subindex and the percentage of Muslim population. Nevertheless, in the regressions we include a Muslim and a Christian dummy (*Muslim* and *Christian*) to control for the impact of religion, the left-out category being countries that have neither a majority of Muslim nor a majority of Christian population.10

To account for female representation, which is highlighted by e.g. Swamy et al. (2001) and Dollar et al. (2001), we include three measures of female representation. We take data from World Bank (2009) on the proportion of female legislators (*Parliament*), the female

<sup>9</sup>Albania, Azerbaijan, Gambia, Guinea, Kyrgyz Republic, Mali, Morocco, Niger, Senegal, Sierra Leone, Tajikistan, Tunisia, Turkmenistan, Uzbekistan

<sup>10</sup>As Muslim religion is related to the subindex we also use the percentage of Muslim population instead of the two religion dummies in the regressions. The results are unchanged.

share in professional, technical, administrative and managerial positions (*Managers*),<sup>11</sup> and women's share of labor force (*Labor force*).

To capture democracy we choose the Electoral Democracy index (*Electoral democ.*) of Freedom House (2008) that takes the value 1 if there are competitive, universal, free and secret elections and a multiparty system. An alternative measure is the Polity2 index of the Polity IV Project that we use to check the robustness of the results as *Polity2* measures more closely liberal democracy (Marshall and Jaggers, 2009).<sup>12</sup> Unfortunately, it covers fewer countries than the Electoral democracy index.<sup>13</sup> Dollar et al. (2001), Swamy et al. (2001) and Sung (2003) use either the Civil Liberties index14, the Political Rights index or the Freedom of the Press index of the Freedom House project as regressors in their empirical analysis to measure or to refine the measurement of democracy. It needs to be stressed that these measures are not without methodological problems as they include questions about bribing and other forms of corrupt behavior and are therefore by construction correlated with corruption. The Civil Liberties index includes questions on corruption that restrains free and independent media. The Political Rights index includes questions related to corruption in government. The Freedom of the Press index includes questions on the impact of corruption and bribery on content of the press. Moreover, Sung (2003) uses a rule of law index that is also problematic as rule of law is closely related to the prevalence of corruption. Therefore, from all Freedom House measures only the Electoral Democracy index is included in our regressions to account for democracy.

As additional controls we include:

• the log of GDP per capita in constant prices to control for the level of economic development as combatting corruption might be costly, and as poorer people might tend to engage more in corrupt activities (*log GDP*) <sup>15</sup> (Swamy et al., 2001);

<sup>11</sup>Both indicators have been criticized (Bardhan and Klasen, 1999; Dijkstra, 2002). In some countries, for example communist ones, parliaments lack power and the representation of women in these parliaments does not reflect actual power of women. Moreover, female representation in parliament measures representation only at the national level and ignores women's participation at other levels of the state and in civil society. A similar problem is attached to the representation of women in senior economic positions that measures only formal sectors. In addition, this indicator does not fluctuate much over years. However, given that there is a lack of data available for women's representation at the local and societal level as well as for informal economic participation and to be comparable to other studies, we use both measures.

<sup>12</sup>Current data for the Polity IV Project can be found at http://www.systemicpeace.org/polity/polity4.htm.

<sup>13</sup>We use averages over ten years to capture stability of democracy. For the 121 countries for which both Electoral democracy and Polity2 are available, the Pearson Correlation Coefficient between them is 0.90 and significant.

<sup>14</sup>The Civil liberties index from Freedom House (2008) measures civil liberties in general and is not to be mixed up with the subindex Civil liberties related to gender inequality.

<sup>15</sup>US\$, PPP, base year: 2005.

#### 4.2. EMPIRICAL ESTIMATION AND RESULTS 83


The subindex Civil liberties reflects the information available around the year 2000 and is not expected to change rapidly over time as social institutions are long-lasting and change only slowly and incrementally. For this reason, we use averages of the existing values over time in the case of all other variables to minimize the loss of observations due to missing values and to obtain a more stable value for the indicators used. For the corruption indicators representing our response variables we take averages over the years 2001 to 2005 for the CPI and in the case of the ICRG over the period 2000-2004. For the other regressors we use averages over ten years (1996-2005), with the exception of ethnic fractionalization as changes in the ethnic composition of a country in less than 20 years are rare (Alesina et al., 2003). Concerning the two democracy variables, choosing averages over ten years has the advantage of capturing the stability of a democratic system, which has been highlighted by Treisman (2007) as important for corruption. In addition, having a difference of five years between response variable and the regressors might help to alleviate endogeneity and capture delays until possible effects can be observed.

### **4.2.2 Empirical Estimation**

We empirically test with multiple linear regressions whether the subindex Civil liberties *si*, which measures the freedom of social participation of women, is correlated with a response

variable *yi* capturing the level of corruption, after controlling for other factors that have been described in the literature as possible determinants of corruption.16 As was discussed previously, we consider that social institutions related to gender inequality are relatively stable and long lasting. Therefore, we assume that they do not depend on the response variable for the period considered.17

We run regressions as

$$\mathbf{y}\_{l} = \mathbf{a} + \mathbf{g}\mathbf{s}\_{l} + \text{control variables}\_{l} + \mathbf{e}\_{l} \tag{4.1}$$

using information at the country level. We are mainly interested in testing the null hypothesis that coefficient β is zero at a statistical significance level of 10%. The control variables included to attenuate omitted variable bias are described in Table 5.29 in the Appendix. We acknowledge, however, that it is impossible to entirely rule out this problem.

To reproduce the findings from the literature, we first run a regression without the subindex Civil liberties to focus on the effects of democracy and representation of women, which have been largely discussed. In a second step, we add to the regressions the subindex Civil liberties as a measure of society's attitude towards gender inequality, as it can be argued that it is a variable that has been omitted in the previous regressions (Swamy et al., 2001). We run each specification for the two measures of corruption and use each time one of the two alternative measures of democracy. At the end, we present four regressions for each corruption indicator.

Preliminary regressions not reported here suggest that heteroscedasticity is a possible issue in our data and that there are influential observations that could drive the results. If our model is well specified, the OLS estimator of the regression parameters remains unbiased in the presence of heteroscedasticity, but the estimator of the covariance matrix of the parameter estimates can be biased and inconsistent, making inference about the estimated regression parameters problematic. Violations of homoscedasticity can lead to hypothesis tests that are not valid and confidence intervals that are either too narrow or too wide. To deal with heteroscedasticity, we run the regressions with OLS and 'heteroscedasticity-consistent' (HC) standard errors. As our sample sizes are less than 150, we use HC3 robust standard errors proposed by Davidson and MacKinnon (1993), which are better with small samples.<sup>18</sup>

<sup>16</sup>Before conducting the multiple linear regression analysis, we account for the importance of GDP for corruption. We first run a simple linear regression of each corruption measure on log GDP. We then compute the estimated residuals from this regression and use them as the dependent variable in a new simple linear regression where the subindex Civil liberties is the only regressor. For both CPI and ICRG we obtain a negative and significant coefficient for the subindex Civil liberties which suggests that the subindex is able to account for something that goes beyond GDP when explaining corruption.

<sup>17</sup>In general, social institutions, i.e. normative frameworks, change only slowly and incrementally.

<sup>18</sup>Simulation studies by Long and Ervin (2000) have shown that HC standard error estimates tend to maintain test size closer to the nominal alpha level in the presence of heteroscedasticity than OLS standard error

#### 4.2. EMPIRICAL ESTIMATION AND RESULTS 85

For all the regressions, we check whether the results concerning the subindex civil liberties are stable in three ways. First, it is clear that in the multiple regressions, the estimate of the effect of our main variable, the subindex civil liberties, depends on the values of the other explanatory variables included (Mukherjee et al., 1998). We also try a simpler model to confirm that the estimated coefficient of the subindex civil liberties is negative and statistically significant. In this smaller model and based on the arguments presented before, we include as additional regressors the variables capturing the representation of women in society, a measure of democracy, the log GDP, religion dummies and regional dummies. This has the advantage that less parameters have to be estimated with the available observations.

Secondly, we use bootstrap with 1000 replications to compute a Bias-corrected and accelerated (Bca) 90% confidence interval of the regression coefficients computed with OLS to confirm that the value zero is not contained in the confidence interval around β (Efron and Tibshirani, 1993). One of the main advantages of bootstrapping methods is that one does not make any assumptions about the sampling distribution or about the statistic. Third, we detect observations with high influence or leverage based on the first estimates (OLS with standard variance estimator) using Cook's distance. Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression, and it measures the effect of deleting a given observation. We exclude the countries identified as outliers from the sample if the value of Cook's distance is larger than 4/*n*, with *n* being the number of observations, and re-estimate equation 4.1 on the restricted sample using HC3 robust standard errors.

One should consider that possible endogeneity of the regressor *si* (the subindex Civil liberties), meaning that *si* is correlated with the error term ε*<sup>i</sup>* in the regression, might lead to an estimated coefficient of *si* that is biased. Endogeneity might arise due to omitted variables, measurement error and simultaneity (Wooldridge, 2002). The control variables included in the regression aim at minimizing omitted variable bias, albeit one cannot rule out this problem. We do not find it plausible that there are measurement errors in *si* which are related to the unobserved 'true' social institutions. Simultaneity could arise if *si* is determined simultaneously with the dependent variable *yi*. As was discussed previously, social institutions related to gender inequality *si* are relatively stable and long-lasting. Hence, it is unlikely that the response variable *yi* influences *si*.

estimates that assume homoscedasticity. These authors recommend the use of HC3 robust standard errors, especially for sample sizes less than 250, as they can keep the test size at the nominal level regardless of the presence or absence of heteroscedasticity, with only a minor loss of power associated when the errors are indeed homoscedastic. We acknowledge that heteroscedasticity-consistent standard errors are not a panacea for inferential problems under heteroscedasticity. As pointed out by some authors, there are limitations and trade-offs in these estimators (e.g. Kauermann and Carroll, 2001; Wilcox, 2001).

### **4.2.3 Results**

Results for the CPI as the first measure of corruption are presented in Table 4.2. Specifications (1) and (2) do not include the subindex Civil liberties. In both specifications, none of the democracy variables Electoral democracy and Polity2 are significant. >From the three measures of representation of women only Parliament is significant and positively related to corruption in specification (1) where Electoral democracy is the measure of democracy. Of the control variables only GDP has a significant and positive coefficient. In specifications (3) and (4) the subindex Civil liberties is added as a new regressor to the former specifications. Its coefficient is negative and significant in both. Both democracy variables as well as the measures for participation of women in the economy are not significant. Only Parliament carries a positive and significant coefficient when Electoral democracy is used (specification (3)). In the same specification (3) two control variables besides log GDP become significant: British colony and the regional dummy for ECA. For all four specifications the adjusted R square is around 0.5.

Table 4.3 shows the results when ICRG is used as the measure of corruption. For all 4 specifications (1-4), none of the variables reflecting representation of women and none of the democracy measures is significant. Interestingly, log GDP is also insignificant in all specifications, whereas it is always significant when the CPI is used as measure of corruption. Openness is the only control variable which is significant in all specifications. Important for the results of this paper, the subindex Civil liberties is significant in specifications (3) and (4), and adding it to the corresponding regressions yields values for adjusted R-square that are noticeably larger than without it. It must be noted, however, that the obtained values for adjusted R-square for the regressions with the ICRG are lower than for the CPI (between 0.2 and 0.3 for the ICRG and around 0.5 for the CPI), suggesting that the model is not able to explain much of the variation of the political risk of instability due to corruption.


#### Table 4.2: Linear Regressions With Dependent Variable CPI

HC3 robust standard errors in brackets.

Regional dummies included in all estimations. ∗*p* < 0.10, \*\* *p* < 0.05, \*\*\* *p* < 0.01


Table 4.3: Linear Regressions With Dependent Variable ICRG

HC3 robust standard errors in brackets.

Regional dummies included in all estimations.

∗*p* < 0.10, \*\* *p* < 0.05, \*\*\* *p* < 0.01

Using a simpler model does not change the results for the subindex Civil liberties and the variables measuring representation of women and democracy. These findings do also withstand the two other robustness checks. First, we confirm with Bias-corrected and accelerated (Bca) confidence intervals that in all cases the value zero is not contained in the 90% confidence interval around the regression coefficient of the subindex Civil liberties. Secondly, excluding outliers (6 to 7 countries) and re-running specifications (3) and (4) for both corruption measures, the subindex Civil liberties remains significant in all estimations. It is worth mentioning that for every restricted sample, the adjusted R-square is higher than in the corresponding complete sample.19

Summarizing the results, when we do not include the subindex Civil liberties we find that from all variables for representation of women only Parliament is significant in the case of the CPI as long as Electoral democracy is used as measure of democracy. If one uses Polity2 instead, Parliament becomes insignificant. None of the democracy measures turns out to be significant. When we include the subindex Civil liberties, the results for representation of women and the democracy variables stay unchanged. Neither representation of women, except Parliament in the case of CPI when Electoral democracy is used, nor the democracy variables are significantly related to corruption. The main result concerning the subindex Civil liberties is that even after controlling for democracy and for measures of political and economic participation of women as well as for other factors, we find a robust and significant relationship between the subindex Civil liberties, which reflects society's attitude towards gender inequality, and the level of corruption. Social institutions favoring gender inequality are associated with higher levels of corruption.

## **4.3 Conclusion**

The literature investigating the link between gender and corruption finds that there is a relationship between female representation in political and economic life and the level of corruption in a country. However, some studies warn that the observed relationship may be due to omitted variable bias. A possible variable that might influence both participation of women and corruption, is liberal democracy (e.g. Sung, 2003). We introduce a further omitted variable that has either been neglected in the literature or not been adequately dealt with because of insufficient data. Swamy et al. (2001) refer to this as the "level of discrimination against women" and proxy it with the gaps in educational attainment and life expectancy between men and women. We use the subindex civil liberties, which we consider a better proxy of the "level of discrimination against women" as it captures social institutions that restrict women in their freedom to participate in the public and reflect society's attitude towards gender inequality. The subindex measures underlying institutions and not outcomes of these institutions as do the variables used by Swamy et al. (2001).

When we replicate the findings of the literature for our sample of developing countries without the social institutions indicator, the results support the hypothesis of Sung (2003) and others that, when liberal democracy (in our case measured with Polity2) is considered in the regression, the representation of women in political and economic life is insignificant. However, Sung's hypothesis is weakened by the fact that there is no statistically significant

<sup>19</sup>Results for all the robustness checks are not reported here, but are available upon request.

association between democracy and corruption. Consequently, our statistical results support neither Sung's arguments nor the arguments put forward by Swamy et al. (2001) and Dollar et al. (2001) that representation of women is negatively related to corruption.20 These results make it difficult to interpret social institutions related to gender inequality as an omitted variable when one investigates the relationship between representation of women in society, democracy and corruption.<sup>21</sup>

Once we include the subindex Civil liberties as a regressor, we find that after controlling for representation of women in political and economic life and for democracy, it has a robust negative and significant effect on corruption. Consequently, the main finding of this study is that in countries where social institutions inhibit the freedom of women to participate in social life, the level of corruption is higher.

Admittedly, one has to be cautious with these results. Interpretations for these findings in the light of the theories discussed are difficult, and country or regional studies are needed. Measurement is another relevant issue as the concepts of social institutions, democracy, participation of women and corruption are all hard to operationalize. Finally, it cannot be ruled out that another factor, which has been neglected from the analysis, shapes the results.

Nevertheless, we derive one policy implication from this study, which should be mainly targeted at developing countries. In a context where social institutions deprive women of the freedom to participate in social life, neither political reforms towards democracy nor the representation of women in political and economic positions might be enough to reduce corruption. How women are treated in a society is not only important for them, but has major implications for the functioning of the whole society.

<sup>20</sup>Once again, our sample includes only developing countries, while the other studies include developed countries as well.

<sup>21</sup>We have estimated with multivariate regressions, not reported here, whether there is (1) a relationship between democracy and the subindex Civil liberties and (2) a relationship between representation of women in society and the subindex Civil liberties in our sample of developing countries, but did not find significant results.

## **Chapter 5**

# **Health Inequality in Bolivia: The Role of Indigenous Heterogeneity**<sup>1</sup>

## **5.1 Introduction**

Improving child health is a priority issue in achieving the Millennium Development Goals (MDGs). Goal 4 is exclusively dedicated to the reduction of child mortality. In developing countries, child health status is often linked to ethnic origin. Especially in Latin America, ethnic and racial divisions exist all over the continent and it is widely acknowledged that people of indigenous origin are still a socially disadvantaged group suffering more from marginalization, poverty and health problems than the non-indigenous population (Hall et al., 2006; Stephens et al., 2006). Although at the end of 2004 the General Assembly of the United Nations proclaimed a Second International Decade of the World's Indigenous People, starting in 2005, as a response to the problems that indigenous people face, the MDGs are too general and fail to incorporate the indigenous face of poverty and health problems (Telles, 2007).

Health inequity plays a major role for indigenous people (Braveman and Tarimo, 2002; Stephens et al., 2006). It refers to social inequality in health that arises because of social disadvantages associated with characteristics like gender, ethnicity, geographical location, economic, political resources, etc. Health inequity harms the affected people, and has a damaging effect on the welfare of a country as it contributes to the spread of diseases not only among the disadvantaged but also among the more privileged groups. Possible cost savings through preventive measures are not realized. Health inequity also means that the labor productivity of parts of the society decreases. In general, it is an impeding factor for development (Braveman and Tarimo, 2002). This is one reason why the World Health Organization has set

<sup>1</sup>joint work with Elena Gross

itself the target of making inequities in health visible so as to overcome existing inequalities (WHO, 2008).

This study analyzes inequality in child health between indigenous and non-indigenous people in Bolivia to explore whether and how indigeneity should be dealt with in order to achieve improvements in child health and attain the MDGs. We focus on Bolivia because the indigenous population constitutes more than half of the total. According to the last census (2001) there were about 3.9 million indigenous people living in Bolivia, which corresponds to 62% of the total population (Layton and Patrinos, 2006; Pozo et al., 2006).<sup>2</sup>

The existence of gaps in health between the indigenous and non-indigenous population of Bolivia are confirmed by the literature (e.g. UDAPE and OPS, 2004; Pozo et al., 2006; PAHO, 2007). Overall infant mortality, incidence of child diseases like measles and rubella, diarrheal diseases and malnourishment are higher for the indigenous groups; adult indigenous persons' health status is lagging behind the health status of the population with Spanish ancestors; and native people are disadvantaged in access to medical care (UDAPE and OPS, 2004; Pozo et al., 2006; PAHO, 2007). However, these studies mostly conduct a descriptive and bivariate analysis which has the shortcoming that other factors that might be related to both ethnic origin and health such as poverty, urban-rural differences and geographical location and other household-related characteristics are not taken into account.

If one wants to analyze indigeneity and health in Bolivia, it cannot be completely ruled out that native origin might be only a proxy for other characteristics like wealth, geographical setting, or living in an urban or rural area, which all have the potential to cause inequalities in health. In Bolivia, these characteristics are strongly associated with being indigenous (see Tables 5.1 and 5.2). Using the DHS data, we find that the indigenous population makes up over 70% of the total rural population, whilst almost 80% of the non-indigenous population resides in urban areas. Urban-rural differences have to be considered in particular because of better infrastructure and provision of public services in urban areas. The urban population may therefore have advantages in access to health facilities and services. Moreover, sanitation and water services as well as education and social networks that might contribute to better health are expected to be of higher quality in less sparsely populated urban areas (Heaton and Forste, 2003).

<sup>2</sup>According to our estimations using the Demographic and Health Survey (DHS) data in 2003, they make up 46%.


Read as

(a) (b) The

Source:

Table5.1:PopulationSharesbyIndigenousGroups Read as "percentage of the non-indigenous/indigenous/Quechua etc. population". Urban-rural as well as high plains-valleys-lowlands add up to 100%.(a) **"Indigenous"** consists of Quechua, Aymara, Guarani and for the DHS 2003. (b) The category**"Other groups"**comprises Guarani and other in the DHS data 2003.

Source: DHS 2003, own estimations

Considering the regional distribution of these population groups, one observes a pattern of location and settlement. This is important as climatic conditions, agricultural production and food availability differ by region so that nutritional patterns, diseases and access to health care vary across location (e.g. Pérez-Cueto et al., 2009). Bolivia consists of three regions corresponding to three distinct ecozones that differ according to health conditions and opportunities for production. The semi-arid high plains of the Andes in the western part of the country are characterized by cool temperatures and frost, infertile soils and irregular rainfalls that limit farming activities to raising livestock for wool production and the cultivation of crops like potatoes and cereals that can withstand the conditions. Mining is another major activity as there are still deposits of e.g. tin, zinc and silver. The high altitude can have a negative influence on health status as low oxygen concentration and low atmospheric pressure, cold and radiation might negatively affect children's growth (Morales et al., 2004). The fertile valleys in the east-southern Andes have more moderate to semi-tropical temperatures, making traditional agriculture in the form of dairy farming and the cultivation of crops easier. The eastern lowlands are mainly tropical except for the semi-arid region of the Chaco and provide fertile grounds for commercial agriculture and cattle ranches. Moreover, oil and natural gas deposits exist in this region (Liberato et al., 2006; The PRS Group, 2008).3 Indigenous people are concentrated in the high plains with about 50% of all indigenous people followed by the valley region with about 37%. In the high plains they account for more than 60% and in the valleys for about 56% of the total population. About 50% of the non-indigenous population has its residence in the more prosperous but less settled region of the lowlands and they account for over 80% of the population there.

Indigeneity can also be used as a proxy for poverty (Stephens et al., 2006). A lower socioeconomic level is associated with a higher risk of infections and diseases due to bad nourishment and with lower access to health services and treatment (Marmot, 2005; PAHO, 2007).<sup>4</sup> In 2002, poverty rates reached 73.9% among the indigenous population of Bolivia whereas only 52.5% were poor among the non-indigenous population (Pozo et al., 2006). Additionally, Table 5.3 presents the distribution of some household and maternal characteristics over ethnic origin using variables contained in the DHS. The figures suggest that ethnic origin might foremost capture differences in years of education of the mother or in mother's knowledge about health.

http://www.fao.org/ag/AGP/AGPC/doc/Counprof/Bolivia/bolivia.htm, date of access May, 2010.

<sup>3</sup>For a good summary of the geographical conditions see

<sup>420%</sup> of the poorest quintile have access to health services in Bolivia. In the second poorest quintile 45 % of the population have access. Only Guatemala and Peru rank lower (PAHO, 2007).


Table 5.3: Distribution of Maternal and Household Characteristics by Ethnic Origin

Based on these examples it becomes obvious that one needs to go beyond descriptive bivariate analysis to detect whether it is ethnic origin, residing in urban or rural areas, living in a certain geographical location, wealth or other household related characteristics that make a difference for health outcomes.

There are investigations that study health using a multivariate regression framework controlling for other relevant factors like altitude or wealth. Mayer-Foulkes and Larrea (2005) conducted a study on Bolivia. They base their inquiry on health concentration indices and a decomposition of inequality controlling for education and health. To proxy for health they build four indices taking as input variables on maternal and child health from Demographic and Health Survey (DHS) data (1997). These are a health knowledge index, a health service use index, a health status index and a summary measure that combines the three indices. They find health inequalities related to ethnic origin. Indigenous people living primarily in rural areas and having a lower educational status suffer more from health problems. Larrea and Freire (2002) investigate social inequality in child malnutrition in Bolivia, Ecuador, Peru and Colombia using multivariate regressions. In the case of Bolivia using DHS data (1997), indigenous people are found to have twice as high prevalence rates of stunting as their nonindigenous counterparts, with indigenous people in the highlands suffering more than those

living in the lowlands. Morales et al. (2004) also focus on malnutrition in Bolivia. In a multivariate regression framework, they find that belonging to the native group of the Quechua and living at a high altitude increase malnutrition.

Although a multivariate regression framework allows for more precision concerning the influence of indigeneity, these studies have their limitations as well. Some of them focus only on malnutrition indicators and delimit implications concerning health for this problem (Larrea and Freire, 2002; Morales et al., 2004). Others like Mayer-Foulkes and Larrea (2005) use composite measures of health that make it difficult to derive policy recommendations when it comes to prevention strategies for particular diseases. Another shortcoming of these studies except for that of Morales et al. (2004) is that they fail to differentiate between the ethnic groups living in Bolivia.

The indigenous population of Bolivia is not homogeneous but consists of distinct communities which comprise different cultures, customs, traditions, and beliefs (Layton and Patrinos, 2006). Over 30 different groups live on the Bolivian territory. The largest indigenous groups are the Aymara (about 17% DHS 2003) and the Quechua (about 27% DHS 2003) followed by the Guaraní and other groups (about 1% or less each DHS 2003). The Aymara and the Quechua populate the high plains and the valleys. About 90% of the Aymara and about 40% of the Quechua live in the high plains. 58% of the Quechua live in the valleys. The Quechua group makes up the largest share of the poor population followed by the Aymara although over 50% of the other indigenous groups are poor. These examples show that neglecting heterogeneity within the indigenous population might mask differences and complicate policy implications. Even then native origin remains a black box. For example, Morales et al. (2004) cautiously attribute differences in malnutrition outcomes between Aymara and Quechua to cultural patterns as genetic differences between the two groups should be minimal. Underlying social norms and culture might result in differences in behavior between the ethnic groups which might affect health outcomes. Although the investigations control for household characteristics, there is still a need to go beyond the indigenous, Quechua or Aymara dummy.

Using several indicators on childhood diseases and overall morbidity (diarrhea, stunting and under-five mortality) and data on vaccinations (DPT/Polio, measles, tuberculosis/BCG) taken from the DHS 2003, this study seeks to contribute to this literature, focusing on five major research questions:

*(1) Is there health inequality between indigenous and non-indigenous children?* According to the previous discussion, one would expect that for every health indicator indigeneity is associated with a higher probability of suffering from a disease or a worse health status and a lower probability of receiving vaccination.

#### 5.1. INTRODUCTION 97

*(2) Does the indigenous dummy mask heterogeneity in health outcomes between children of different indigenous origin?* As there are different native people in Bolivia that live in different areas of the country the hypothesis is that there are significant differences in health outcomes between native groups. As the Aymara and Quechua people are the largest native groups in Bolivia we investigate whether both or only one of these two groups is significantly different from the non-indigenous population. When exploring the differences between the groups we again take into account factors like urban-rural differences, poverty and regional location as well as household and maternal characteristics to check whether these differences remain.

*(3) Is the indigenous dummy a proxy for urban-rural differences, poverty and regional location, so that the effect of indigeneity vanishes if one takes these factors into account?* The effect of indigeneity should capture the effect of these factors. However, based on the literature, the hypothesis is that ethnic origin has an effect on health even if one controls for regional location, urban-rural differences and wealth.

*(4) Do statistically significant differences between indigenous and non-indigenous children disappear if one controls for household characteristics and characteristics of the mother?* Underlying social norms and culture might be reflected in household characteristics and characteristics of the mother related to education, health knowledge and access to health care, and one aim of the study is to explore whether this is the case.

*(5) Is health inequality related to wealth so that within the indigenous and non-indigenous group diseases are concentrated among the poor?* Concentration indices have become standard tools of health inequality analysis and are therefore not neglected in this study (e.g. O'Donnell et al., 2008; Kakwani et al., 1997; Wagstaff et al., 1991). We compute these indices to complement the former investigation by a focus on health inequality within groups to get a more refined picture of the situation in Bolivia. The expectation is that there is concentration of ill-health among the poor and of vaccinations among the rich. To contribute to this literature we explore whether this type of health inequality is found for the different indigenous and non-indigenous groups.

To answer these questions, for each health variable we start with a bivariate analysis of contingency tables and run multivariate regressions to investigate between-group (ethnic origin) inequality in health. Then we compute concentration indices to estimate health inequality related to wealth within these groups. For under-five mortality we take a slightly different approach which enables to consider the problem of censoring. Doing such a case study and combining all these methods is a necessary step in order to detect disparities and the driving forces of child health inequality in Bolivia. It might help to derive policy implications, for example to design intervention strategies, to identify target groups and create equity-promoting health systems (Braveman and Tarimo, 2002; Marmot, 2005; WHO, 2008).

The main findings of this study are the following. First, the bivariate analysis of health inequality due to ethnic origin hides possible relationships with other variables that might be proxied by the indigenous dummy. Therefore, conducting a multivariate analysis is a necessary exercise to get a precise picture. Secondly, the indigenous dummy masks variation within the indigenous group. Consequently, using dummies for different ethnic groups like the Aymara and the Quechua gives valuable information. Thirdly, dummies for these different ethnic groups also capture effects of other variables in particular characteristics of the mother should be accounted for when analyzing health inequality in Bolivia. Finally, the results are dependent on the health indicator under examination. Findings differ if one uses indicators for childhood diseases and morbidity or vaccination variables.

The next section deals with the data and variables used in this study. Section 5.3 presents the methods of the health inequality analysis and in section 5.4 the results are described. The last section concludes.

## **5.2 Data**

For the analysis of health inequality, we use the Demographic and Health Survey (DHS) from 2003, the latest one available for Bolivia. The data set contains detailed information on birth records, anthropometric measures, several measures of childhood diseases and data on vaccinations. It also provides a set of dummies on durable goods and housing conditions that capture assets which are used to proxy wealth.

To measure health we use different outcome and health-related behavior variables. The DHS provides information on the prevalence of diarrheal diseases *(diarrhea)* in the last two weeks, which is a typical measure used in the health economics literature to capture childhood diseases and overall morbidity (Mayer-Foulkes and Larrea, 2005; PAHO, 2007; WHO, 2008). As a further indicator for morbidity, the DHS also offers data on *stunting*, defined as low height for age. We use this as evidence for chronic malnourishment, which affects the development of a child as a whole. Suffering from malnourishment has an impact on physical and cognitive development, and causes a higher risk of diseases due to insufficient vitamin and nutrient intake. Moreover, children with slow body growth are exposed to a higher risk of overweight in later years which is associated with further non-communicable diseases like diabetes (PAHO, 2007).<sup>5</sup> We complement this information on childhood diseases and overall

<sup>5</sup>Moreover, low height for age as the indicator for malnourishment has the advantage that in measuring the growth of children self-reporting bias is low. Furthermore, it is now established that distributions of the heights of healthy children are comparable (Sahn and Younger, 2006). For the calculation

morbidity with demographic information that we use to calculate under-five mortality, which is the most reliable indicator of child health.6

The DHS also contains variables on communicable diseases, in particular data on their prevention through vaccinations. Communicable diseases are still one of the major causes of child mortality in less-developed countries (Lopez et al., 2006).<sup>7</sup> Most of these so-called childhood cluster diseases like diphtheria or tuberculosis are preventable through vaccinations (Lopez et al., 2006). Therefore, indicators on vaccinations are useful measures to approach health inequities at a prevention level. Low levels of prevention can be associated either with problems in access to health services or with a lack of demand for vaccinations.<sup>8</sup> From the DHS, we take the available data on vaccinations against diphtheria, pertussis, tetanus (DPT) and polio *(DPT/Polio)*, data on vaccinations against *measles* and on bacille Calmette-Guérin vaccinations against tuberculosis *(BCG)*. Although in recent years none of the above vaccination-preventable diseases has caused high death rates, the analysis of inequality in vaccinations in the case of Bolivia is relevant, as the coverage rate with vaccinations is very low (PAHO, 2007). This makes the outbreak and spreading of diseases possible.<sup>9</sup>

When coding the variables we take into account that effective immunization is reached only when vaccinations are given in a certain time span and/or enough doses are administered. Moreover, age dependence of vaccinations is considered by not using information on children who at the date of the survey have not reached the age at which the immunization should be done. To deal with factors like improvements in health technology, knowledge about health, other improvements in infrastructure and possible major economic shocks that may affect generations during their lifetime, we restrict the sample for the childhood disease indicators

see (WHO Multicentre Growth Reference Study Group, 2006). We use the special programs of the WHO (http://www.who.int/childgrowth/software/en/) using the WHO Reference 2007.

<sup>6</sup>The so called neglected diseases - Chagas, dengue fever, leprosy and leishmaniasis - are still prevalent in Bolivia. Especially in rural areas poor people face a higher risk of being infected by these diseases as housing conditions are bad (Hotez et al., 2008). These diseases are expensive to diagnose and to cure. Moreover, they demand a long recovery period or can cause disability. The Bolivian departments of Santa Cruz and Pando report new incidence of these cases each year and have a large risk group of 8% in the local population (PAHO, 2007). Data on neglected diseases is hard to obtain on a survey-based level, and is thus not included in this study.

<sup>7</sup>Indeed non-communicable chronic diseases are the main cause of death, both in developed and lessdeveloped countries, and are related to malnutrition and bad health conditions, or can be caused by other factors (WHO, 2008). But due to data restrictions we use the available information on communicable diseases.

<sup>8</sup>In general, it is possible to get six vaccinations for 11 different diseases to prevent infections that primarily affect children. The eleven diseases are: yellow fever; diphtheria, pertussis, tetanus (DPT); polio(myelitis); hepatitis B; measles, rubella, mumps; tuberculosis (BCG); and influenza (PAHO, 2007).

<sup>9</sup>There were numerous cases of rubella (945) in 2000/01 and countable cases of pertussis (68), and diphtheria (8) in the period 2001-2005. Also tuberculosis is of great concern since a strategy to analyze risk groups and to prevent tuberculosis is lacking. There is only an active detection of tuberculosis (Stephens et al., 2006).

to children that have not reached the age of five and for the vaccination variables to children under the age of three. Table 5.4 gives the definitions and coding scheme for the health variables used in this study.10


As indigeneity is the focus of this study we have to be concerned about an appropriate measure. The most often used identifiers are language and self-identification, although geographical location may be combined with the two (Layton and Patrinos, 2006; Stephens et al., 2006). Using language can lead to an underestimation of the indigenous population as there may be indigenous descendants who declare their native tongue to be Spanish or who do not speak any indigenous language. Moreover, complications may arise due to the existence of multilingual populations. Self-identification overcomes the disadvantages of the language identifier but it can lead either to an underestimation of the indigenous group if there is discrimination against and social exclusion of indigenous people, or to an overestimation if there are benefits connected with being indigenous (Layton and Patrinos, 2006).

The DHS 2003 includes a simple measure of languages spoken that is used to identify the indigenous population and to build an *indigenous* dummy.11 Additionally, we use information on different indigenous groups, as - besides the possible errors in measurement - one has to be

<sup>10</sup>When coding the variables we faced a trade-off between accuracy and having enough observations. This trade-off explains the deviations from the schedule recommended by the WHO.

<sup>11</sup>The DHS 1994 and 1998 from Bolivia only include information about the language of the household interview. We do not accept this as an adequate measure for ethnicity.

aware that the two groups, indigenous and non-indigenous, are not homogeneous in the sense that they are built out of distinct communities which comprise different cultures, customs, traditions and beliefs (Layton and Patrinos, 2006). We construct a dummy for the *Quechua* and the *Aymara* populations as they constitute the largest indigenous groups in Bolivia. We do not consider the other indigenous groups as even of they are combined in one category sample size is too small to consider them as a single category in a regression analysis.

The DHS provides information on household assets that we combine into an asset-index to proxy wealth using polychoric principal component analysis, which is the appropriate method for categorical variables (Kolenikov and Angeles, 2009).<sup>12</sup> Using the asset index as a long-term measure of economic wealth based on stock indicators (Deaton, 1997; Mayer-Foulkes and Larrea, 2005), we classify the population into quintiles of wealth defining the first quintile as *poor* and all other quintiles as non-poor.

Besides using indigeneity and wealth, we make the general distinction between *urban* and rural areas in Bolivia. Urban areas are defined as towns with more than 2,000 inhabitants and differ from rural areas in the infrastructure and public services provided to the population. The urban population may have advantages in access to health facilities and services and may benefit from better sanitation and water services, education and social networks (Heaton and Forste, 2003). Alongside these considerations, we include the urban-rural division in the health analysis, as a major part of the rural population is indigenous whereas the non-indigenous population mainly lives in urban areas. Geographically, we analyze the data according to the three ecozones *high plains*-*valleys*-*lowlands* mentioned in the Introduction, as geographical characteristics are assumed to be related to health.

Additionally, we use indicators of household characteristics and characteristics of the mother as control variables in the regression analysis described below. These household characteristics are the number of children in a household under age three or five (*children und. three/children und. five*), household size (*hh size*), a dummy indicating *low water quality* based on the source of water, information on the sex of the child (*girl*) and the sex of the household head (*female hh head*). The characteristics of the mother should capture mother's knowledge about health and her ability to deal with health problems and access to health care (see, e.g. Liberato et al., 2006; Mayer-Foulkes and Larrea, 2005; Morales et al., 2004). We use mother's age at birth (*m's age at birth*), mother's education (*m's education*), a variable

<sup>12</sup>The assets used to measure housing quality and access to public services are source of drinking water, type of toilet facility, has electricity, main floor material, main wall material, main roof material, share toilet with other, household's type of cooking fuel, place for hand washing, water tap, has an exclusive room for kitchen, number of rooms excluding kitchen and bathrooms, number of bedrooms, electric water pump. Durable goods are measured with the following items: has radio, has television, has refrigerator, has bicycle, has motorcycle/scooter, has car/truck, has telephone.

capturing whether the mother knows a modern contraceptive method (*knowl. of contracept*), a variable indicating whether the mother lacks knowledge about where to go to get medical help for herself (*probl. with med. help*) and a variable indicating problems in getting medical help caused by the distance to a health facility (*probl. with distance to med. help*). In Appendix 5, Table 5.33 presents descriptive statistics for the variables used.

## **5.3 Methodology - Health Inequality Analysis**

### **5.3.1 Analysis of Health Inequality Between Groups: Contingency Tables and Multivariate Regressions**

We start with a bivariate analysis of contingency tables to investigate whether there are differences in health 'levels' between groups and check the association between two binary variables *h* and *x*, with *h* measuring health of a child and *x* measuring the group affiliation of the child.<sup>13</sup> First, we consider the *conditional distribution* of *h* at various levels of *x* and compare the conditional probability that a child is sick given it is a member of group1 *p*<sup>1</sup> = *P*(*h* = 1|*x* = 1) with the conditional probability that the child is sick given that it is a member of group2 *p*<sup>2</sup> = *P*(*h* = 1|*x* = 0). Secondly, using *Pearson Chi-square test* we check whether the two variables *x* and *h* are statistically independent.14 Thirdly, another useful summary measure of association between binary variables is the *relative risk ratio (rr)*. It compares two groups with respect to the probability that an event is occurring in the groups. In this study the relative risk ratio gives the extent to which one of the two groups is more likely to suffer from diseases. The relative risk of getting sick for group1 compared to group2 is

$$
\pi r \quad = \ \frac{(p\_1)}{(p\_2)},
$$

and for group2 compared to group1 respectively. If *rr* = 1, then suffering from a disease is independent of group affiliation. If *rr* > 1 then diseases are more likely in group1 than in group2 and vice versa (Agresti, 1990).

The next step of the health inequality analysis consists of estimating a multivariate regression model that allows investigating the effect of ethnic origin by holding the other group characteristics constant. We estimate a regression for the whole population including the indigenous dummy as the main regressor (Model 1) and introduce step by step the other dummy

<sup>13</sup>For a good introduction to the analysis of contingency tables see Agresti (1990).

<sup>14</sup>To evaluate the null-hypothesis that the health status of a child is independent from its group affiliation χ<sup>2</sup> which compares observed with expected frequencies, and considers the degrees of freedom of the test is calculated. If the probability that the χ2-statistic belongs to a χ2-distribution with the calculated degrees of freedom is smaller than 0.05, then the null hypothesis of independence is rejected.

variables poor, urban, valleys and lowlands (with high plains being the reference category) (Models 2, 3, 4 and 5). Then we add a set of control variables measuring characteristics of the household (number of children in a household, household size, sex of the household head and the child, and water quality in the household) (Model 6). Next, we change the set of control variables and include indicators capturing characteristics of the mother related to education, health knowledge and access to health care (Model 7). In the final specification both sets of control variables are considered (Model 8). This procedure of incorporating different sets of control variables is used to explore whether the indigenous dummy stays significant. If it turns insignificant the indigenous dummy might only be a proxy for the other variables.

For each health outcome *h* for child *i* with *i* = 1,...,*n*, we estimate a simple logit model of the form

$$\begin{aligned} P(h=1) &=& \mathsf{A}(\mathsf{a} + \mathsf{\beta}\_1 Indigenous\_i \\ &+& \mathsf{\beta}\_2 Poor\_i + \mathsf{\beta}\_3 Urband\_i + \mathsf{\beta}\_4 Valelys\_i + \mathsf{\beta}\_5 Loovlands\_i \\ &+& \text{contros housachold}\_i + \text{contros mother} \mathsf{r}\_i + \mathsf{e}\_i \text{)}, \end{aligned}$$

where Λ is the c.d.f of the logistic distribution. This equation presents the final specification that is estimated.

To go beyond the indigenous dummy and to account for possible heterogeneity within the indigenous population we re-estimate the eight regressions for the whole sample replacing the indigenous dummy with dummies for the Quechua and the Aymara group having the non-indigenous population as the reference category. We drop all observations belonging to other indigenous groups as sample size for these groups is too small to allow for statistical inference. This way we can detect whether both native groups or only one of them is significantly different from the non-indigenous population which would also give insights into the differences between these native groups.

### **5.3.2 Analysis of Health Inequality Within Groups: Concentration Indices**

The bivariate analysis of contingency tables and the multivariate regression analysis give insights into the extent of health inequality *between* groups. *Concentration indices* have become standard tools of health inequality analysis and are therefore not neglected in this study (e.g. O'Donnell et al., 2008; Kakwani et al., 1997; Wagstaff et al., 1991). We compute these indices to complement the former investigation by a focus on health inequality *within* groups to get a more refined picture of the situation in Bolivia. Concentration indices measure the extent of health inequality that is systematically associated to wealth. Comparing concentration indices over population subgroups helps to examine differences in the distribution of health between different groups (Wagstaff et al., 1991; Lindelow, 2006; O'Donnell et al., 2008). The concentration index can be computed as

$$\begin{aligned} C\_1 &= \frac{2}{(N\mu)} \sum\_{i=1}^N h\_i r\_i - 1, \\ &= \frac{2}{\mu} \text{cov}(h, r), \end{aligned}$$

where *hi* indicates health of household *i*, *μ* is the mean level of health and *n* is the sample size. *ri* is the fractional rank of household *i* according to the wealth indicator *yi*. For computing the concentration indices, we recode the vaccination variables assigning the value 1 if there was no vaccination. This makes the results comparable to those of the health status variables that also have the value 1 in case of illness. If ill-health measured by health status and (missing) vaccinations is concentrated among the poor, concentration indices will be negative. If *C* = 0, then there is equality,*C* < 0 indicates pro-rich inequality so that health is concentrated among the rich, and *C* > 0 indicates pro-poor inequality describing the opposite case (O'Donnell et al., 2008).<sup>15</sup>

Estimating the coefficient β from the following regression gives the concentration index

$$2\alpha^2 \left(\frac{h\_i}{\mu}\right)^2 = -\alpha + \mathfrak{B}r\_i + \mathfrak{e}\_i.$$

Using the standard error of ˆ β makes inference possible (Kakwani et al., 1997).16

### **5.3.3 Estimating and Explaining Under-five Mortality**

To estimate under-five-mortality we use methods of survival analysis that take into account the issue of right-censoring. Right-censoring means that the relevant event (death of a child) had not occurred until the observation time ends. Consequently, the total length of time till the event will occur is unknown. For under-five-mortality this means that a child has not yet reached the age of five at the end of the observation period and we do not know whether it will survive up to age five or not.

To calculate mortality rates, we use the *lifetable method* that is suited to the situation when grouped survival time data is observed, although the underlying survival time is continuous. After having defined the intervals used to group the data, one can calculate the survival

<sup>15</sup>When computing the concentration indices, we correct for the number of children in a household to not penalize or reward households with high numbers or low numbers of children.

<sup>16</sup>The standard error of ˆ β does not take into account sampling variability of the mean of the health variable *h* that enters the left hand side. Moreover, the fractional rank *r* has no sampling variability, but according to O'Donnell et al. (2008) taking sampling variability into account in estimating the standard error makes only little difference, so we rely on the standard error of the regression for estimating and testing concentration indices.

probabilities.<sup>17</sup> Let *T* be the non-negative continuous survival time. Define intervals of time *Ij* where *j* = 1,...*J* : *Ij* : [*tj*,*tj* +1) and let *nj* denote the number of subjects at risk of dying at start of the interval, *dj* the subjects that will die in the interval, *cj* denotes the subjects that are censored in the interval. To handle these censored observations, one assumes that they are uniformly distributed over the interval so that half of the censored observations are at risk of dying. Therefore, one defines an adjusted number at risk of dying and reduces the size of the subjects censored in the interval by one-half. The average size of the risk set in the interval is then *nj* − (*cj*/2) (e.g. Jenkins, 2005b; Hosmer et al., 2008). The life table estimator of the survival function is the product of the conditional probabilities of survival through the interval and is obtained using the following formula:

$$S\_{\bar{j}\;} = \prod\_{j=1}^{J} \frac{n\_j - (c\_j/2) - d\_j}{n\_j - (c\_j/2)}$$

The corresponding mortality rate per thousand of children is

$$\mathcal{M}\_{\dot{j}} \; = \; \; (1 - \mathcal{S}\_{\dot{j}}) \* 1000 \; .$$

We compute the mortality rate for the indigenous and non-indigenous population, for the Quechua and Aymara people, for urban and rural areas, and for the three regions high plains, valleys and lowlands. Under five mortality is also computed for the quintiles of wealth to explore health inequality according to wealth.

To study the effect of indigeneity on under five mortality in a multivariate framework we estimate a *discrete proportional hazard model with frailty* to allow for unobserved individual effects and to reduce bias due to omitted variables and measurement errors in observed survival times or regressors (Allison, 1982; Jenkins, 2005b).<sup>18</sup> We use the discrete time specification as exact survival times of the children are not known but fall within an interval of time. Let *T* be a non-negative continuous random variable representing survival time that again is grouped into intervals, in this case months, *Ij*, with *j* = 1,...*J* : *Ij* : [*tj*,*tj* +1). Moreover, a vector of explanatory variables *X* is observed. This vector includes the variables that entered the logit model in the previous section. The discrete time (interval) hazard function

<sup>17</sup>The definition of intervals follows the proposition of the DHS with age segments 0, 1, 2, 3 to 5, 6 to 11, 12 to 23, 24 to 35, 36 to 47, 48 to 59 months. See http://www.measuredhs.com/help/Datasets/Methodology\_of\_DHS\_Mortality\_Rates\_Estimation.htm

<sup>18</sup>ρ (reported in the regression Tables 5.39 and 5.45) is defined as the ratio of the heterogeneity variance to one plus the heterogeneity variance. If the hypothesis that ρ is zero cannot be rejected, frailty is not important (Jenkins, 2005a).

*hj*(*X*) for month *j* , which is the conditional probability that a child *i* with *i* = 1,...,*n* dies in month *j* given that the child *i* has not died up to this month is then defined by

$$h\_l j(X) \;=\; Pr(T\_l = j | T\_l \ge j, X) \;.$$

Specifying a functional form on how hazard depends on *X* we get

$$h\_i j(X) \ = \ 1 - \exp[-\exp(\gamma\_j + \beta X\_i + \mathfrak{e}\_i)],$$

where γ *<sup>j</sup>* summarizes the pattern of duration dependence. In our case we assume that duration dependence is piecewise constant so that the hazard differs between every six months. To achieve this we create duration specific interval dummy variables one for each interval of six months less 1. ε*<sup>i</sup>* is the error term to account for unobserved heterogeneity or frailty and is assumed to be normally distributed. Using the complementary log-log transformation we get

$$\log[-\log(1 - hj(\boldsymbol{\beta}, \boldsymbol{X}))] \ = \quad \gamma\_{\boldsymbol{j}} + \mathsf{B}\boldsymbol{X}\_{\boldsymbol{i}} + \mathsf{e}\_{\boldsymbol{i}}\boldsymbol{\epsilon}\_{\boldsymbol{i}}$$

This model is estimated using Maximum Likelihood estimation that takes into account censored observations.<sup>19</sup> The Likelihood is defined as

$$L\_i = \prod\_{i=1}^n [Pr(Ti=j)]^{c\_i} [Pr(T\_i > j)]^{(1-c\_i)},$$

where *ci* is a censoring indicator that takes the value 1 if there is no censoring and 0 if there is censoring (Allison, 1982; Jenkins, 2005b).

## **5.4 Results**

### **5.4.1 General Description and Bivariate Analysis of Health in Bolivia**

The under-five mortality rate of Bolivia, at 74 children per 1,000 live births, places the country at the bottom of the South American Countries.<sup>20</sup> The indicators for childhood diseases and overall morbidity of the DHS 2003 used in this study reflect that in 2003 about 20% to 30% of children under age of five were affected. About 22% of children had had diarrhea recently and about 26% of all children under five years of age were stunted. The low immunization rates for the typical vaccine-preventable diseases in Bolivia in 2003 also indicate problems either with health-related behavior or with supply of health services. About 9.4% of children

<sup>19</sup>Estimation is done using the Stata command xtcloglog.

<sup>20</sup>Estimates of http://www.childinfo.org/mortality\_ufmrcountrydata.php (date of access, April 2010) provided by UNICEF for the year 2000 indicate that in 2000 Bolivia had a rate of 86 followed by Guyana with 72. In comparison to this, other Andean countries had a markedly lower rate. Under-five mortality rates for 2000 were 26 in Colombia, 11 in Chile, 34 in Ecuador and 41 in Peru.

under three had received all three doses of DPT and polio vaccinations in the first 12 months. Only 6.3% received a vaccination against measles between 12 and 15 months. The lowest rate is observed for BCG vaccinations, with 5.5% receiving this immunization in the first month after birth (Table 5.34 in Appendix 5).

Taking a bivariate perspective, Table 5.35 on under-five mortality in Bolivia shows that with about 103 children per 1,000 live births the indigenous group is worse off than the nonindigenous one with around 52 children per 1,000 live births. The analysis of contingency tables suggests that indigeneity increases the likelihood of suffering from diseases and decreases the probability of receiving a vaccination. Taking the extreme examples, indigenous children face a 77% higher risk of being stunted, whereas non-indigenous children are 44% more likely to get a BCG vaccination (Tables 5.36 and 5.37 in Appendix 5).

### **5.4.2 Results from Regression Analysis and Concentration Indices**

The results from the regressions and the concentration indices will be analyzed by using the questions posed in the introduction. To answer the questions, it has been necessary to estimate many regression models which are presented in Appendix 5 to keep the overview in the text.

*(1) Is there health inequality between indigenous and non-indigenous children?* The multivariate analysis shows that whether there is a relationship between ethnic origin and health depends on the health indicator under examination. Whereas in the case of the *childhood diseases and morbidity indicators* a statistically significant relationship between ethnic origin and the probability of suffering from a disease is found, there is none if *vaccination variables* are considered. Indigenous origin is positively related to diarrhea, under-five mortality and stunting. Consequently, the hypothesis that indigenous origin is associated with a higher probability of suffering from a disease or a worse health status can be confirmed (Tables 5.38, 5.39, and 5.40). For the *vaccination variables* DPT, polio, measles and BCG vaccinations, the multivariate analysis contradicts the findings of the bivariate analysis as a significant robust effect of indigenous origin cannot be found (Tables 5.41, 5.42 and 5.43). Thus, the bivariate results hide a large amount of information in the case of vaccinations and the hypothesis that indigenous origin is related to a lower probability of receiving a vaccination has to be questioned.

*(2) Does the indigenous dummy mask heterogeneity in health outcomes between children of different native origins?* In all regressions, even those for the vaccination variables DPT, polio, BCG and measles vaccinations for which the indigenous dummy was not significant,

there is at least one native group that is significantly different from the non-indigenous one. Again an interesting pattern emerges. For the childhood diseases and morbidity indicators being *Quechua* is associated with a significantly higher probability of suffering from diarrhea, under-five mortality or stunting, whereas there is no statistically significant difference between the non-indigenous and the Aymara group if one controls for the whole range of control variables (Tables 5.44, 5.45 and 5.46).

Regarding the vaccination variables, the pattern is not consistent. For DPT, polio and measles vaccinations, if one controls for all the control variables only the coefficient of the *Aymara* dummy is negative and significant at the 10 percent level. While there is no statistical difference between the Quechua and non-indigenous people, being Aymara is associated with a lower probability of receiving one of these vaccinations (Tables 5.47 and 5.49). With respect to BCG vaccinations it is the Quechua dummy which is significant in the final specification but unexpectedly the Quechua face a higher likelihood to receive a BCG vaccination than the non-indigenous people (Table 5.48).

Having found that are differences in health outcomes between different native groups, to answer questions 3 and 4 we use the regression results where the Quechua and Aymara dummies are used instead of the simple indigenous dummy.<sup>21</sup>

*(3) Are the Quechua and/or Aymara dummies proxies for urban-rural differences, poverty and regional location, so that their statistically significant effect vanishes if one takes these factors into account?* As the Quechua and Aymara dummies do not turn insignificant if one takes these factors into account, they seem not to proxy them. Only in the regression for measles, the Quechua dummy is insignificant as soon as one controls for regions. However, the region dummies do not show any significant effect (Tables 5.49.

*(4) Do statistically significant differences between Quechua and/or Aymara children and the non-indigenous children disappear if one controls for household characteristics and characteristics of the mother?* For under-five mortality and stunting the *Aymara* dummy becomes insignificant if characteristics of the mother are included. A better health knowledge of the mother (measured with less problems of where to get medical help and a higher knowledge of modern contraceptive methods), a higher mother's education as well as a higher mother's age at birth are significantly related to a lower likelihood to die under five or to be stunted (Tables 5.39 and 5.46).

<sup>21</sup>Remember that all observations belonging to other indigenous groups have been dropped as they constitute a too small number to allow for inferences.

For the vaccination variables the picture is less coherent but again characteristics of the mother are important. For BCG vaccinations a similar picture to the one for the childhood diseases and morbidity indicators becomes apparent. The Aymara dummy becomes insignificant if maternal characteristics are controlled for. Better access to medical help for the mother, a better health knowledge of the mother and a higher mother's education are all significantly related to a higher likelihood of receiving this vaccination. With respect to DPT, polio vaccinations, it is the Quechua dummy that turns insignificant when maternal characteristics are included. Again a better health knowledge (proxied by the question whether the mother has problems with where to find medical help) and in this case a lower age at marriage of the mother are associated with a higher probability of getting vaccinated (Tables 5.47 and 5.48). For measles vaccinations and diarrhea we do not find such a result.

Concluding, the results of mother's health knowledge, access to health, mother's education and mother's age at birth show that the effect of ethnic origin vanishes as soon as these characteristics are taken into account. Acknowledging possible problems of endogeneity due to reverse causality or omitted variable bias one might interpret these variables as intermittent variables or the pathways ethnic origin takes when producing health outcomes.

*(5) Is there health inequality related to wealth so that within the indigenous and nonindigenous group diseases are concentrated among the poor?* Concerning wealth-related health inequality, which is reported in Tables 5.5 and 5.6, the expected pattern for ill-health so that diarrhea, stunting and under-five mortality are concentrated among the poor is confirmed. In the case of *diarrhea* the concentration index is negative and significant for the non-indigenous group and the urban children. This might be due to the fact that there is more wealth and therefore more dispersion of wealth in these groups. Regarding *stunting* there are significant and negative concentration indices for all groups. Interestingly, there is again more wealth-related health inequality for the non-indigenous than the indigenous group. Consequently, the poor of the non-indigenous are affected more. For *under-five mortality* we calculated mortality per quintile and found that under-five mortality decreases steadily with quintile of wealth.

The expected pattern that *vaccinations* are concentrated among the rich cannot be confirmed. All concentration indices indicate that vaccinations are concentrated among the poor. In 1979 the Expanded Program on Immunization (EPI) was established in Bolivia with support from PAHO and other donors. This was followed by the launch of EPI II as a response to a dramatic drop in vaccination coverage in 1996. With EPI II, vaccination schemes have become part of public health insurance, new vaccines were introduced and institutional deficiencies with respect to financing, surveillance and control were tackled. Although not


Table 5.5: Concentration Indices

Table 5.6: Under-five Mortality per Quintile


without difficulties, the EPI II strategy lead to an increase in vaccination coverage (World Bank, 2001). The launch of EPI II certainly reached the poor better than previous initiatives. However, we do not have an explanation as to what might have led to the result of concentration of vaccinations among the poor.22 It could be that nation-wide vaccination campaigns reach the poorer population better than the richer one. This should be an issue for further research.

The regression results support the findings for the concentration indices at least for the childhood disease. In this case, the poverty dummy carries a positive and significant sign. Regarding the results for the vaccination variables, the poverty dummy is not significant except for the BCG vaccinations however, the coefficient becomes insignificant when characteristics of the mother are included in the model (Table 5.42).

## **5.5 Conclusions, Further Research and Policy Implications**

This paper is about inequities in child health related to indigenous origin in Bolivia. It aims at giving a detailed picture shedding light on whether ethnic origin is decisive for childhood diseases and vaccinations. Most of the studies investigating health inequality in Bolivia recognize that ethnic origin is a decisive factor. However, these studies have their limitations.

<sup>22</sup>According to (MMWR Weekly, 2000) there was a nationwide, house-to-house vaccination campaign initiated in September 2000 to administer all vaccines used in the routine infant vaccination schedule (diphtheria and tetanus toxoids and pertussis vaccine (DTP), measles, mumps, and rubella vaccine, and oral poliovirus vaccine).

They are limited to a bivariate analysis which should be interpreted carefully as ethnic origin is associated with other factors which might be responsible for the observed inequality by ethnic origin. Possible candidates are poverty, urban-rural differences or geographical region, and characteristics of the household and the mother. Moreover, if multiple regressions are used, it is appealing to employ only the indigenous dummy, but this has the disadvantage of hiding differences between distinct indigenous groups. Finally, most of the studies focus on only one indicator of health or combine several health indicators into an index, although the reality might be more diverse depending on the health indicator used.

In this study, we have adjusted for these limitations by conducting a multivariate regression analysis using different health indicators on childhood diseases and vaccinations, and several control variables to separate the effect of ethnic origin from other influential factors. Moreover, we have accounted for possible heterogeneity between distinct indigenous groups. The results support that it is necessary to take these issues into account.

This study yields the following insights: First, whether or not there is a relationship between ethnic origin and health depends on the health indicator under examination. Indigenous origin is positively related to childhood diseases and morbidity measured with under-five mortality, diarrhea and stunting. But for the vaccination variables a robust effect is not found.

Secondly, the indigenous dummy masks considerable heterogeneity between different native groups. Even in those regressions for which the indigenous dummy is not significant, there is at least one native group that is significantly different from the non-indigenous one. The Quechua are those, which are more likely to suffer from a bad health status than the nonindigenous children if all control variables are considered. With respect to the probability of receiving a vaccination, the Aymara are those, which are worse off than the non-indigenous children (DPT, polio and measles). Notably, in the case of BCG vaccinations the Quechua are even better off than their non-indigenous counterparts.

Thirdly, when investigating health outcomes the Aymara or Quechua dummies seem not to be proxies for regional location, poverty, urban-rural differences and characteristics of the household. However, in most of the regressions (under-five mortality, stunting, DPT, polio and BCG vaccinations) one of the Aymara and Quechua dummies turns insignificant if characteristics of the mother are included. Relevant characteristics are mother's access to health services, mother's age at birth, health knowledge of the mother and mother's education with the last two factors showing significant results in all of the here considered regressions.

Finally, health inequality related to wealth is more pronounced for the non-indigenous group than for the indigenous one. Ill-health is concentrated among the poor if childhood diseases are investigated. However, regarding vaccinations the result is rather unexpected. Vaccinations are concentrated among the poor. We do not have an explanation for this result. In Bolivia the Expanded Program on Immunization led to an improvement in vaccination coverage and supported nation-wide vaccination campaigns. However, whether these campaigns reached the poor better than the rich or whether the rich might be more skeptical with regard to vaccinations, is open to further research.

In conclusion, conducting a multivariate analysis using dummies for different ethnic groups is essential in order to get a precise picture. But as the dummies for different ethnic groups can also capture effects of other variables, searching for the factors that are behind to ethnic differences is the most important task for future research to alleviate inequities in health. Is it institutional mechanisms that lead to discrimination or different cultural habits, is it household or maternal characteristics, wealth or geographical information, or something else? This study has shown that mother's education, mother's health knowledge, mother's age at birth and access to health are all relevant and considering them makes the dummies for ethnic origin insignificant. Moreover, these variables might be the pathways ethnic origin takes in influencing health outcomes. A further investigation on how maternal characteristics and ethnic origin are related and how they interact in producing health outcomes might be of high value. Finally, the rather unexpected and incoherent results for the vaccination variables point out that investigating these vaccinations variables in a separate study could give more insights into the functioning of health services and/or health-related behavior.

However, this study has its limitations. Having only cross-sectional data does not allow us to talk about causal effects. Comparable panel data would allow better inference. Conceptually, we do not know what is behind the indigenous, Quechua, Aymara, 'other indigenous' and the non-indigenous dummies. Furthermore, measurement errors can bias the results. For example, people with different cultural backgrounds might have a different tolerance to diseases and therefore report them differently. Moreover, the results might depend on the coding of the time span when an effective vaccination should take place. Finally, the choice of the diseases is rather arbitrary and affected by the information available in the DHS. Other diseases like non-communicable or chronic diseases should be examined in further research.

A last comment should be made here. The study certainly gives evidence that in Bolivia indigenous people have a worse health status than non-indigenous people. As indigenous people make up about 50% of the population, this has important implications for Bolivia's contribution in achieving the MDGs. However, our results emphasize that a simple formula of choosing indigenous people as a target group of health interventions falls short of recognizing the realities, as the term masks heterogeneity within this group. One should go beyond ethnic origin, "Quechua" and "Aymara" and look for factors like maternal characteristics that might be the pathways through which ethnic origin produces health outcomes. Moreover,

an operationalization of institutional mechanisms that are related to ethnic origin might give even more insights. This should be kept in mind when policies are designed.

# **Appendix 1**

Table 5.7: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their LevelsLife Expectancy, 1970


and

 of Income Table 5.8: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of Income andLife Expectancy, 1980


Table 5.9: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their LevelsLife Expectancy, 1990


and

 of Income Table 5.10: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of Income andLife Expectancy, 2000


Table 5.11: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of Income andLiteracy, 1970


Table 5.12: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of Income and Literacy, 1980



Table 5.13: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of Income and1990

5.14: Democracies (Polity2 score>1) and Autocracies (Polity2 score<=0) Classified According to their Levels of IncomeLiteracy,2000

Table 


 and


Table 5.15: Summary Statistics (over 1970, 1975, 1980, 1985, 1990, 1995, 2000)


Table 5.16: Correlation Matrix (over 1970, 1975, 1980, 1985, 1990, 1995, 2000)

# **Appendix 2**



earmarr stands for the variable Early marriage, polyg for Polygamy, parauth is the variable Parental authority and inher is the variable inheritance. For a description of these variables, see section 2.2. The p-values correspond to the null hypothesis that the two variables are independent.

Table 5.18: Kendall Tau b: Dimension Civil Liberties


freemov stands for the variable Freedom of movement. obliveil is the variable Obligation to wear a veil in public. For a description of these variables, see section 2.2. The p-value correspond to the null hypothesis that two variables are independent.


Table 5.19: Kendall Tau b: Dimension Physical Integrity with Missing Women

femmut stands for the variable Female Genital Mutilation, vio for Violence against women and misswom is the variable Missing women. For a description of these variables, see section 2.2. The p-values correspond to the null hypothesis that the two variables are independent.

Table 5.20: Kendall Tau b: Dimension Physical Integrity without Missing Women


femmut stands for the variable Female Genital Mutilation and vio for Violence against women. For a description of these variables, see section 2.2. The p-value correspond to the null hypothesis that two variables are independent.



womland stands for the variable Women's access to land. womloans is the variable Women's access to loans and womprop is the variable Women's access to property other than land. For a description of these variables, see section 2.2. The p-values correspond to the null hypothesis that the two variables are independent.

Figure 5.1: MJCA for the Dimension Family Code

earmarr stands for the variables Early marriage, polyg for Polygamy, parauth is the variable Parental authority and inher is the variable inheritance. For a description of these variables, see section 2.2.

freemov stands for the variable Freedom of movement. obliveil is the variable Obligation to wear a veil in public. For a description of these variables, see section 2.2.

femmut stands for the variable Female Genital Mutilation, vio for Violence against women and misssk is the variable Missing women. For a description of these variables, see section 2.2.

Figure 5.4: MJCA for the Dimension Physical Integrity without Missing Women

femmut stands for the variable Female Genital Mutilation and vio for Violence against women. For a description of these variables, see section 2.2.

womland stands for the variable Women's access to land. womloan is the variable Women's access to loans and womprop is the variable Women's access to property other than land. For a description of these variables, see section 2.2.

### **Objectives, Properties and Proofs**

In this section, we present the objectives and properties that we consider relevant for any composite index related to social institutions related to gender inequality. Moreover, we show that the proposed index fulfills all of them. We use the following notation. Let *X <sup>j</sup>* , with *j* = *A*,*B*, be the vector containing the values of the subindices *x<sup>j</sup> <sup>i</sup>* , with *i* = 1,...,*n*, for the country *j* 23. *I*(*X*) represents the composite index.

#### **Objectives of the Index**

The objectives of the index are the following:


<sup>23</sup>In what follows, the superscript *j* will only be used if it is necessary to distinguish countries.

#### **Properties of the Index**

Some of the properties that any index should fulfill are:

	- *I*(*X*) must be defined for 0 ≤ *xi* ≤ 1, *i* = 1,...,*n*.
	- 0 ≤ *I*(*X*) ≤ 1 must hold for any *X*.
	- If *xi* = 0 ∀*i*, then *I*(*X*) = 0. If *xi* = 1 ∀*i*, then *I*(*X*) = 1.
	- (a) If *x*<sup>1</sup> increases by |*x*1| and *x*<sup>2</sup> decreases by |*x*2| and |*x*1| = |*x*2|, then *I*(*X*) must increase.
	- (b) For *I*(*X*) to remain unchanged, we must have |*x*2| > |*x*1|.

#### **Proofs**

The composite index *I*(*X*) is defined as

$$I(X) = \frac{1}{n} \sum\_{i=1}^{n} (x\_i - \mathbf{0})^2.$$

The index proposed fulfills all the stated properties.

#### 1. **Support and range of** *I*(*X*)


#### 2. **Anonymity (symmetry)**

The value of *I*(*X <sup>j</sup>* ) does not depend either on the names of the subindices nor on the name of the country (*j*).

#### 3. **Unanimity (Pareto Optimality)**

If we assume that ∀*i*

$$x\_i^A \le x\_i^B,$$

then we can show that

$$\begin{array}{rcl}(\mathbf{x}\_i^A)^2 &\leq& (\mathbf{x}\_i^B)^2\\\frac{1}{n}\sum\_{i=1}^n(\mathbf{x}\_i^A - \mathbf{0})^2&\leq& \frac{1}{n}\sum\_{i=1}^n(\mathbf{x}\_i^B - \mathbf{0})^2\\I(X^A)&\leq& I(X^B).\end{array}$$

#### 4. **Monotonicity**

We assume that

$$\begin{aligned} I(X^A) &\le \quad I(X^B) \\ \frac{1}{n} \sum\_{i=1}^n (x\_i^A - 0)^2 &\le \quad \frac{1}{n} \sum\_{i=1}^n (x\_i^B - 0)^2. \end{aligned}$$

Let us suppose, without loss of generality, that subindex *x*<sup>1</sup> improves (decreases) by δ > 0 for country *A*. Then we have that

$$\frac{1}{n}(\mathbf{x}\_1^A - \mathbf{\delta} - \mathbf{0})^2 + \frac{1}{n}\sum\_{i=2}^n (\mathbf{x}\_i^A - \mathbf{0})^2 \quad \le \quad \frac{1}{n}\sum\_{i=1}^n (\mathbf{x}\_i^A - \mathbf{0})^2,$$

and hence

$$\frac{1}{n}(\mathbf{x}\_{l}^{A} - \mathfrak{d} - \mathbf{0})^{2} + \frac{1}{n}\sum\_{i=2}^{n}(\mathbf{x}\_{l}^{A} - \mathbf{0})^{2} \quad \leq \quad \frac{1}{n}\sum\_{i=1}^{n}(\mathbf{x}\_{l}^{B} - \mathbf{0})^{2}.$$

This means that

$$I(X^{A^\*}) \le \quad I(X^B)$$

with *XA*<sup>∗</sup> defined as the vector corresponding to country *A* with only one variable having improved (decreased) by δ.

#### 5. **Penalization of inequality in the case of equal means**

If we assume equal means, so that

$$\mu = \frac{1}{n} \sum\_{i=1}^{n} \left( x\_i^A \right) = \frac{1}{n} \sum\_{i=1}^{n} \left( x\_i^B \right),$$

then we also have

$$\sum\_{i=1}^{n} (x\_i^A) = \sum\_{i=1}^{n} (x\_i^B).$$

If we assume that the variance of *X<sup>A</sup>* is smaller than the variance of *X<sup>B</sup>* so that

$$\frac{1}{n}\sum\_{i=1}^{n}(x\_i^A - \mu)^2 \quad \text{ } \quad \frac{1}{n}\sum\_{i=1}^{n}(x\_i^B - \mu)^2,$$

we can show that

$$\begin{aligned} \sum\_{i=1}^n \left[ (\mathbf{x}\_i^A)^2 - 2\mu \mathbf{x}\_i^A + \mu^2 \right] &< \sum\_{i=1}^n \left[ (\mathbf{x}\_i^B)^2 - 2\mu \mathbf{x}\_i^B + \mu^2 \right],\\ \sum\_{i=1}^n (\mathbf{x}\_i^A)^2 - 2\mu \sum\_{i=1}^n \mathbf{x}\_i^A + n\mu^2 &< \sum\_{i=1}^n (\mathbf{x}\_i^B)^2 - 2\mu \sum\_{i=1}^n \mathbf{x}\_i^B + n\mu^2.\end{aligned}$$

As ∑*<sup>n</sup> i*=1(*xA <sup>i</sup>* ) = ∑*<sup>n</sup> i*=1(*xB <sup>i</sup>* ), we have that

$$\begin{aligned} \sum\_{i=1}^n (\mathbf{x}\_i^A)^2 &< \quad \sum\_{i=1}^n (\mathbf{x}\_i^B)^2\\ \frac{1}{n} \sum\_{i=1}^n (\mathbf{x}\_i^A - \mathbf{0})^2 &< \quad \frac{1}{n} \sum\_{i=1}^n (\mathbf{x}\_i^B - \mathbf{0})^2\\ I(X^A) &< \quad I(X^B). \end{aligned}$$

#### 6. **Compensation property**

In a two-variable example, let *x*<sup>1</sup> ≤ 1−*x*1, and *x*<sup>2</sup> ≤ 1−*x*2.

(a) We can show that if *x*<sup>1</sup> = *x*<sup>2</sup> = δ > 0, then

$$\begin{array}{rcl} x\_2 & < & x\_1 + \delta \\ 0 & < & x\_1 - x\_2 + \delta \\ \end{array}$$

$$\begin{array}{rcl} 0 & < & 2\delta(x\_1 - x\_2 + \delta) \\ \end{array}$$

$$\begin{array}{rcl} x\_1^2 + x\_2^2 & < & x\_1^2 + x\_2^2 + 2\delta(x\_1 - x\_2 + \delta) \\ \frac{1}{2}\left(x\_1^2 + x\_2^2\right) & < & \frac{1}{2}\left(x\_1^2 + 2\delta x\_1 + \delta^2 + x\_2^2 - 2\delta x\_2 + \delta^2\right) \\ \frac{1}{2}\left(x\_1^2 + x\_2^2\right) & < & \frac{1}{2}\left[\left(x\_1^2 + \delta\right)^2 + \left(x\_2^2 - \delta\right)^2\right] \\ I(x\_1, x\_2) & < & I(x\_1 + \delta, x\_2 - \delta), \end{array}$$

and hence we have shown that if *x*<sup>1</sup> increases by δ and *x*<sup>2</sup> decreases by δ, then *I*(*X*) must increase.

(b) Let *x*<sup>1</sup> = *x*<sup>2</sup> = *x* > 0. We will show that if *x*<sup>1</sup> increases by *x*<sup>1</sup> and *x*<sup>2</sup> decreases by *x*<sup>1</sup> and the value of the index remains unchanged, the increase of *x*<sup>1</sup> must be smaller than the absolute value of the decrease in *x*2.

$$\begin{array}{rcl} I(\mathbf{x}\_{1}, \mathbf{x}\_{2}) & = & I(\mathbf{x}\_{1} + \triangle \mathbf{x}\_{1}, \mathbf{x}\_{2} - \triangle \mathbf{x}\_{2}) \\ \frac{1}{2} \left( \mathbf{x}\_{1}^{2} + \mathbf{x}\_{2}^{2} \right) & = & \frac{1}{2} \left[ \left( \mathbf{x}\_{1} + \triangle \mathbf{x}\_{1} \right)^{2} + \left( \mathbf{x}\_{2} - \triangle \mathbf{x}\_{2} \right)^{2} \right] \\ \mathbf{x}\_{1}^{2} + \mathbf{x}\_{2}^{2} & = & \mathbf{x}\_{1}^{2} + 2\mathbf{x}\_{1} \triangle \mathbf{x}\_{1} + \left( \triangle \mathbf{x}\_{1} \right)^{2} + \mathbf{x}\_{2}^{2} - 2\mathbf{x}\_{2} \triangle \mathbf{x}\_{2} + \left( \triangle \mathbf{x}\_{2} \right)^{2} \\ \mathbf{0} & = & 2\mathbf{x}\_{1} \triangle \mathbf{x}\_{1} + \left( \triangle \mathbf{x}\_{1} \right)^{2} - 2\mathbf{x}\_{2} \triangle \mathbf{x}\_{2} + \left( \triangle \mathbf{x}\_{2} \right)^{2} \end{array}$$

Using the fact that *x*<sup>1</sup> = *x*<sup>2</sup> = *x*, we can rewrite this as

$$\begin{array}{rcl} 0 & = & 2\mathbf{x}\triangle\mathbf{x}\_{\mathbf{l}} + (\triangle\mathbf{x}\_{\mathbf{l}})^2 - 2\mathbf{x}\triangle\mathbf{x}\_{2} + (\triangle\mathbf{x}\_{2})^2 \\ 0 & = & 2\mathbf{x}(\triangle\mathbf{x}\_{\mathbf{l}} - \triangle\mathbf{x}\_{2}) + (\triangle\mathbf{x}\_{\mathbf{l}})^2 + (\triangle\mathbf{x}\_{2})^2. \end{array}$$

As 2*x* > 0, (*x*1)<sup>2</sup> > 0, and (*x*2)<sup>2</sup> > 0, we must have that

$$
\begin{aligned}
\triangle \mathbf{x}\_{\mathsf{I}} - \triangle \mathbf{x}\_{\mathsf{2}} &<& \mathbf{0} \\
\triangle \mathbf{x}\_{\mathsf{I}} &<& \triangle \mathbf{x}\_{\mathsf{2}}.
\end{aligned}
$$


Table 5.22: Comparison of the SIGI and the Simple Average of the Subindices


The data are sorted according to the value of the SIGI.


 5.23: Ranking according to the SIGI and the Five Subindices

Table




Table 5.24: Comparison of Ranks: the SIGI and other Gender-related Indices

#### APPENDICES 147


Data for the Gender-related development Index (GDI) and the Gender Empowerment Measure (GEM) are from United Nations Development Programme (2006) and are based on the year 2004. The Gender Gap Index (GGI) capped and the revised Gender Empowerment Measure (GEM revised) are taken from Klasen and Schüler (2009) based on the year 2004. Data for the Global Gender Gap Index (GGG) are from Hausmann et al. (2007). The Women's Social Rights Index (WOSOC) data correspond to the year 2007 and are obtained from http://ciri.binghamton.edu/.

# **Appendix 3**


Table 5.25: Description and Sources of Variables



5.25–continuedfrompreviouspage


Table 5.26: Descriptive Statistics of the Variables Used

Table 5.27: Pearson Correlation Coefficient between the SIGI and the Subindices



Table 5.28: Correlation of the SIGI and the Subindices with the Control Variables


# **Appendix 4**

Table 5.29: Description and Sources of Variables



continuedfrompreviouspage



Table 5.30: Descriptive Statistics of the Variables Used


Table 5.31: Pearson Correlation Coefficient between Subindex Civil liberties and Control Variables


Table 5.32: Ranking According to the Subindex Civil Liberties



# **Appendix 5**


Table 5.33: Descriptives of the Variables Used in the Regression Analysis


Table 5.34: Prevalence Rates in Bolivia

Table 5.35: Under-five Mortality Rates per Thousand Live Births


Table 5.36: Contingency Tables: Childhood Diseases


Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01. Source: DHS 2003, own estimations


 5.37: Contingency Tables: Vaccinations

Table

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01. Source: DHS 2003, own estimations


Table 5.38: Logit Regression Using Diarrhea as Dependent Variable

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations


Table 5.39: Discrete Time Model for Under-five Mortality

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations.

Time dummies measuring six months each included.


 5.40: Logit Regression Using Stunting as Dependent Variable

Table

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations


Table 5.41: Logit Regression Using DPT/Polio Vaccinations as Dependent Variable

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations


Table 5.42: Logit Regression Using BCG Vaccinations as Dependent Variable

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations


Table 5.43: Logit Regression Using Measles Vaccinations as Dependent Variable

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations


Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations

Table 5.44: Logit Regression

 Using Diarrhea as Dependent

 Variable -

Differentiated

 by Indigenous

 Group


Table 5.45: Discrete Time Model for Under-five Mortality - Differentiated by Indigenous Group

Note: \* p< 0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations.

Time dummies measuring six months each included.


Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations

### Maria Ziegler - 978-3-653-00576-9 Downloaded from PubFactory at 01/11/2019 11:43:38AM via free access

Table 5.46: Logit Regression

 Using Stunting as Dependent Variable -


Table 5.47: Logit Regression Using DPT/Polio Vaccinations as Dependent Variable - Differentiated by Indigenous Group

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations



182 APPENDICES

 Group


Table 5.49: Logit Regression Using Measles Vaccinations as Dependent Variable - Differentiated by Indigenous Group

Note: \* p<0.10; \*\* p<0.05; \*\*\* p<0.01, Standard errors in parenthesis. Source: DHS 2003, own estimations

# **Bibliography**







World Bank (2009). GenderStats. Electronic publication. http://genderstats.worldbank.org.

#### **Göttinger Studien zur Entwicklungsökonomik Göttingen Studies in Development Economics**

Herausgegeben von / Edited by Hermann Sautter und / and Stephan Klasen

Die Bände 1-8 sind über die Vervuert Verlagsgesellschaft (Frankfurt/M.) zu beziehen.



www.peterlang.de